1
|
Xie G, Xie W, Gu G, Lin Z, Chen R, Liu S, Yu J. A vector projection similarity-based method for miRNA-disease association prediction. Anal Biochem 2024; 687:115431. [PMID: 38123111 DOI: 10.1016/j.ab.2023.115431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 12/06/2023] [Accepted: 12/15/2023] [Indexed: 12/23/2023]
Abstract
[S U M M A R Y] Many miRNA-disease association prediction models incorporate Gaussian interaction profile kernel similarity (GIPS). However, the GIPS fails to consider the specificity of the miRNA-disease association matrix, where matrix elements with a value of 0 represent miRNA and disease relationships that have not been discovered yet. To address this issue and better account for the impact of known and unknown miRNA-disease associations on similarity, we propose a method called vector projection similarity-based method for miRNA-disease association prediction (VPSMDA). In VPSMDA, we introduce three projection rules and combined with logistic functions for the miRNA-disease association matrix and propose a vector projection similarity measure for miRNAs and diseases. By integrating the vector projection similarity matrix with the original one, we obtain the improved miRNA and disease similarity matrix. Additionally, we construct a weight matrix using different numbers of neighbors to reduce the noise in the similarity matrix. In performance evaluation, both LOOCV and 5-fold CV experiments demonstrate that VPSMDA outperforms seven other state-of-the-art methods in AUC. Furthermore, in a case study, VPSMDA successfully predicted 10, 9, and 10 out of the top 10 associations for three important human diseases, respectively, and these predictions were confirmed by recent biomedical resources.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Weijie Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guosheng Gu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Zhiyi Lin
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Shigang Liu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Junrui Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| |
Collapse
|
2
|
Tian Z, Han C, Xu L, Teng Z, Song W. MGCNSS: miRNA-disease association prediction with multi-layer graph convolution and distance-based negative sample selection strategy. Brief Bioinform 2024; 25:bbae168. [PMID: 38622356 PMCID: PMC11018511 DOI: 10.1093/bib/bbae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/14/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Identifying disease-associated microRNAs (miRNAs) could help understand the deep mechanism of diseases, which promotes the development of new medicine. Recently, network-based approaches have been widely proposed for inferring the potential associations between miRNAs and diseases. However, these approaches ignore the importance of different relations in meta-paths when learning the embeddings of miRNAs and diseases. Besides, they pay little attention to screening out reliable negative samples which is crucial for improving the prediction accuracy. In this study, we propose a novel approach named MGCNSS with the multi-layer graph convolution and high-quality negative sample selection strategy. Specifically, MGCNSS first constructs a comprehensive heterogeneous network by integrating miRNA and disease similarity networks coupled with their known association relationships. Then, we employ the multi-layer graph convolution to automatically capture the meta-path relations with different lengths in the heterogeneous network and learn the discriminative representations of miRNAs and diseases. After that, MGCNSS establishes a highly reliable negative sample set from the unlabeled sample set with the negative distance-based sample selection strategy. Finally, we train MGCNSS under an unsupervised learning manner and predict the potential associations between miRNAs and diseases. The experimental results fully demonstrate that MGCNSS outperforms all baseline methods on both balanced and imbalanced datasets. More importantly, we conduct case studies on colon neoplasms and esophageal neoplasms, further confirming the ability of MGCNSS to detect potential candidate miRNAs. The source code is publicly available on GitHub https://github.com/15136943622/MGCNSS/tree/master.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Chenguang Han
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Lewen Xu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhixia Teng
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Wei Song
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|
3
|
Chen R, Xie G, Lin Z, Gu G, Yu Y, Yu J, Liu Z. Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning. Interdiscip Sci 2024:10.1007/s12539-024-00607-0. [PMID: 38436840 DOI: 10.1007/s12539-024-00607-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 03/05/2024]
Abstract
Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.
Collapse
Affiliation(s)
- Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guobo Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhiyi Lin
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Guosheng Gu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Yi Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Junrui Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhenguo Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China.
| |
Collapse
|
4
|
Han GS, Gao Q, Peng LZ, Tang J. Hessian Regularized [Formula: see text]-Nonnegative Matrix Factorization and Deep Learning for miRNA-Disease Associations Prediction. Interdiscip Sci 2024; 16:176-191. [PMID: 38099958 DOI: 10.1007/s12539-023-00594-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 11/05/2023] [Accepted: 11/07/2023] [Indexed: 02/22/2024]
Abstract
Since the identification of microRNAs (miRNAs), empirical research has demonstrated their crucial involvement in the functioning of organisms. Investigating miRNAs significantly bolsters efforts related to averting, diagnosing, and treating intricate human maladies. Yet, exploring every conceivable miRNA-disease association consumes significant resources and time within conventional wet experiments. On the computational front, forecasting potential miRNA-disease connections serves as a valuable source of preliminary insights for medical investigators. As a result, we have developed a novel matrix factorization model known as Hessian-regularized [Formula: see text] nonnegative matrix factorization in combination with deep learning for predicting associations between miRNAs and diseases, denoted as [Formula: see text]-NMF-DF. In particular, we introduce a novel iterative fusion approach to integrate all similarities. This method effectively diminishes the sparsity of the initial miRNA-disease associations matrix. Additionally, we devise a mixed model framework that utilizes deep learning, matrix decomposition, and singular value decomposition to capture and depict the intricate nonlinear features of miRNA and disease. The prediction performance of the six matrix factorization methods is improved by comparison and analysis, similarity matrix fusion, data preprocessing, and parameter adjustment. The AUC and AUPR obtained by the new matrix factorization model under fivefold cross validation are comparative or better with other matrix factorization models. Finally, we select three diseases including lung tumor, bladder tumor and breast tumor for case analysis, and further extend the matrix factorization model based on deep learning. The results show that the hybrid algorithm combining matrix factorization with deep learning proposed in this paper can predict miRNAs related to different diseases with high accuracy.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China.
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China.
| | - Qi Gao
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Ling-Zhi Peng
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Jing Tang
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| |
Collapse
|
5
|
Liang X, Guo M, Jiang L, Fu Y, Zhang P, Chen Y. Predicting miRNA-Disease Associations by Combining Graph and Hypergraph Convolutional Network. Interdiscip Sci 2024:10.1007/s12539-023-00599-3. [PMID: 38286905 DOI: 10.1007/s12539-023-00599-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/15/2023] [Accepted: 12/17/2023] [Indexed: 01/31/2024]
Abstract
miRNAs are important regulators for many crucial biological processes. Many recent studies have shown that miRNAs are closely related to various human diseases and can be potential biomarkers or therapeutic targets for some diseases, such as cancers. Therefore, accurately predicting miRNA-disease associations is of great importance for understanding and curing diseases. However, how to efficiently utilize the characteristics of miRNAs and diseases and the information on known miRNA-disease associations for prediction is still not fully explored. In this study, we propose a novel computational method for predicting miRNA-disease associations. The proposed method combines the graph convolutional network and the hypergraph convolutional network. The graph convolutional network is utilized to extract the information from miRNA-similarity data as well as disease-similarity data. Based on the representations of miRNAs and diseases learned by the graph convolutional network, we further use the hypergraph convolutional network to capture the complex high-order interactions in the known miRNA-disease associations. We conduct comprehensive experiments with different datasets and predictive tasks. The results show that the proposed method consistently outperforms several other state-of-the-art methods. We also discuss the influence of hyper-parameters and model structures on the performance of our method. Some case studies also demonstrate that the predictive results of the method can be verified by independent experiments.
Collapse
Affiliation(s)
- Xujun Liang
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China.
- National Clinical Research Center for Gerontology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China.
| | - Ming Guo
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
- National Clinical Research Center for Gerontology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
| | - Longying Jiang
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
- Department of Pathology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, China, 410008
| | - Ying Fu
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
- National Clinical Research Center for Gerontology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
| | - Pengfei Zhang
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
- National Clinical Research Center for Gerontology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China
| | - Yongheng Chen
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China.
- National Clinical Research Center for Gerontology, Xiangya Hospital, Central South University, Xiangya Road, Changsha, 410008, China.
| |
Collapse
|
6
|
Xie GB, Liu SG, Gu GS, Lin ZY, Yu JR, Chen RB, Xie WJ, Xu HJ. LUNCRW: Prediction of potential lncRNA-disease associations based on unbalanced neighborhood constraint random walk. Anal Biochem 2023; 679:115297. [PMID: 37619903 DOI: 10.1016/j.ab.2023.115297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 08/26/2023]
Abstract
Accumulating evidence suggests that long non-coding RNAs (lncRNAs) are associated with various complex human diseases. They can serve as disease biomarkers and hold considerable promise for the prevention and treatment of various diseases. The traditional random walk algorithms generally exclude the effect of non-neighboring nodes on random walking. In order to overcome the issue, the neighborhood constraint (NC) approach is proposed in this study for regulating the direction of the random walk by computing the effects of both neighboring nodes and non-neighboring nodes. Then the association matrix is updated by matrix multiplication for minimizing the effect of the false negative data. The heterogeneous lncRNA-disease network is finally analyzed using an unbalanced random walk method for predicting the potential lncRNA-disease associations. The LUNCRW model is therefore developed for predicting potential lncRNA-disease associations. The area under the curve (AUC) values of the LUNCRW model in leave-one-out cross-validation and five-fold cross-validation were 0.951 and 0.9486 ± 0.0011, respectively. Data from published case studies on three diseases, including squamous cell carcinoma, hepatocellular carcinoma, and renal cell carcinoma, confirmed the predictive potential of the LUNCRW model. Altogether, the findings indicated that the performance of the LUNCRW method is superior to that of existing methods in predicting potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Guo-Bo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Shi-Gang Liu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Guo-Sheng Gu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Zhi-Yi Lin
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Jun-Rui Yu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Rui-Bin Chen
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Wei-Jie Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Hao-Jie Xu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| |
Collapse
|
7
|
Wang S, Li J, Wang D, Xu D, Jin J, Wang Y. Predicting Drug-Disease Associations Through Similarity Network Fusion and Multi-View Feature Projection Representation. IEEE J Biomed Health Inform 2023; 27:5165-5176. [PMID: 37527303 DOI: 10.1109/jbhi.2023.3300717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Predicting drug-disease associations (DDAs) through computational methods has become a prevalent trend in drug development because of their high efficiency and low cost. Existing methods usually focus on constructing heterogeneous networks by collecting multiple data resources to improve prediction ability. However, potential association possibilities of numerous unconfirmed drug-related or disease-related pairs are not sufficiently considered. In this article, we propose a novel computational model to predict new DDAs. First, a heterogeneous network is constructed, including four types of nodes (drugs, targets, cell lines, diseases) and three types of edges (associations, association scores, similarities). Second, an updating and merging-based similarity network fusion method, termed UM-SF, is presented to fuse various similarity networks with diverse weights. Finally, an intermediate layer-mediated multi-view feature projection representation method, termed IM-FP, is proposed to calculate the predicted DDA scores. This method uses multiple association scores to construct multi-view drug features, then projects them into disease space through the intermediate layer, where an intermediate layer similarity constraint is designed to learn the projection matrices. Results of comparative experiments reveal the effectiveness of our innovations. Comparisons with other state-of-the-art models by the 10-fold cross-validation experiment indicate our model's advantage on AUROC and AUPR metrics. Moreover, our proposed model successfully predicted 107 novel high-ranked DDAs.
Collapse
|
8
|
Ai N, Liang Y, Yuan H, Ouyang D, Xie S, Liu X. GDCL-NcDA: identifying non-coding RNA-disease associations via contrastive learning between deep graph learning and deep matrix factorization. BMC Genomics 2023; 24:424. [PMID: 37501127 PMCID: PMC10373414 DOI: 10.1186/s12864-023-09501-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 07/02/2023] [Indexed: 07/29/2023] Open
Abstract
Non-coding RNAs (ncRNAs) draw much attention from studies widely in recent years because they play vital roles in life activities. As a good complement to wet experiment methods, computational prediction methods can greatly save experimental costs. However, high false-negative data and insufficient use of multi-source information can affect the performance of computational prediction methods. Furthermore, many computational methods do not have good robustness and generalization on different datasets. In this work, we propose an effective end-to-end computing framework, called GDCL-NcDA, of deep graph learning and deep matrix factorization (DMF) with contrastive learning, which identifies the latent ncRNA-disease association on diverse multi-source heterogeneous networks (MHNs). The diverse MHNs include different similarity networks and proven associations among ncRNAs (miRNAs, circRNAs, and lncRNAs), genes, and diseases. Firstly, GDCL-NcDA employs deep graph convolutional network and multiple attention mechanisms to adaptively integrate multi-source of MHNs and reconstruct the ncRNA-disease association graph. Then, GDCL-NcDA utilizes DMF to predict the latent disease-associated ncRNAs based on the reconstructed graphs to reduce the impact of the false-negatives from the original associations. Finally, GDCL-NcDA uses contrastive learning (CL) to generate a contrastive loss on the reconstructed graphs and the predicted graphs to improve the generalization and robustness of our GDCL-NcDA framework. The experimental results show that GDCL-NcDA outperforms highly related computational methods. Moreover, case studies demonstrate the effectiveness of GDCL-NcDA in identifying the associations among diversiform ncRNAs and diseases.
Collapse
Affiliation(s)
- Ning Ai
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China
- School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China.
- Pazhou Laboratory (Huangpu), Guangzhou, 510555, Guangdong, China.
| | - Haoliang Yuan
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China
- School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
| | - Shengli Xie
- Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou, 510000, Guangdong, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, Guangdong, 519090, China
| |
Collapse
|
9
|
Xiang H, Guo R, Liu L, Guo T, Huang Q. MSIF-LNP: microbial and human health association prediction based on matrix factorization noise reduction for similarity fusion and bidirectional linear neighborhood label propagation. Front Microbiol 2023; 14:1216811. [PMID: 37389340 PMCID: PMC10303805 DOI: 10.3389/fmicb.2023.1216811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 05/25/2023] [Indexed: 07/01/2023] Open
Abstract
Studies have shown that microbes are closely related to human health. Clarifying the relationship between microbes and diseases that cause health problems can provide new solutions for the treatment, diagnosis, and prevention of diseases, and provide strong protection for human health. Currently, more and more similarity fusion methods are available to predict potential microbe-disease associations. However, existing methods have noise problems in the process of similarity fusion. To address this issue, we propose a method called MSIF-LNP that can efficiently and accurately identify potential connections between microbes and diseases, and thus clarify the relationship between microbes and human health. This method is based on matrix factorization denoising similarity fusion (MSIF) and bidirectional linear neighborhood propagation (LNP) techniques. First, we use non-linear iterative fusion to obtain a similarity network for microbes and diseases by fusing the initial microbe and disease similarities, and then reduce noise by using matrix factorization. Next, we use the initial microbe-disease association pairs as label information to perform linear neighborhood label propagation on the denoised similarity network of microbes and diseases. This enables us to obtain a score matrix for predicting microbe-disease relationships. We evaluate the predictive performance of MSIF-LNP and seven other advanced methods through 10-fold cross-validation, and the experimental results show that MSIF-LNP outperformed the other seven methods in terms of AUC. In addition, the analysis of Cystic fibrosis and Obesity cases further demonstrate the predictive ability of this method in practical applications.
Collapse
Affiliation(s)
- Hui Xiang
- College of Physical Education, Southwest Forestry University, Kunming, Yunnan, China
| | - Rong Guo
- College of Physical Education, Southwest Forestry University, Kunming, Yunnan, China
| | - Li Liu
- College of Physical Education, Suzhou University, Suzhou, Anhui, China
| | - Tengjie Guo
- College of Physical Education, Yunnan Normal University, Kunming, Yunnan, China
| | - Quan Huang
- College of Physical Education, Southwest Forestry University, Kunming, Yunnan, China
| |
Collapse
|
10
|
Fan C, Ding M. Inferring pseudogene-MiRNA associations based on an ensemble learning framework with similarity kernel fusion. Sci Rep 2023; 13:8833. [PMID: 37258695 DOI: 10.1038/s41598-023-36054-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/28/2023] [Indexed: 06/02/2023] Open
Abstract
Accumulating evidence shows that pseudogenes can function as microRNAs (miRNAs) sponges and regulate gene expression. Mining potential interactions between pseudogenes and miRNAs will facilitate the clinical diagnosis and treatment of complex diseases. However, identifying their interactions through biological experiments is time-consuming and labor intensive. In this study, an ensemble learning framework with similarity kernel fusion is proposed to predict pseudogene-miRNA associations, named ELPMA. First, four pseudogene similarity profiles and five miRNA similarity profiles are measured based on the biological and topology properties. Subsequently, similarity kernel fusion method is used to integrate the similarity profiles. Then, the feature representation for pseudogenes and miRNAs is obtained by combining the pseudogene-pseudogene similarities, miRNA-miRNA similarities. Lastly, individual learners are performed on each training subset, and the soft voting is used to yield final decision based on the prediction results of individual learners. The k-fold cross validation is implemented to evaluate the prediction performance of ELPMA method. Besides, case studies are conducted on three investigated pseudogenes to validate the predict performance of ELPMA method for predicting pseudogene-miRNA interactions. Therefore, all experiment results show that ELPMA model is a feasible and effective tool to predict interactions between pseudogenes and miRNAs.
Collapse
Affiliation(s)
- Chunyan Fan
- School of Computer Science and Engineering, Xi'an Technological University, Xi'an, 710021, China.
| | - Mingchao Ding
- School of Computer Science, Hubei University of Technology, Wuhan, 430068, China
| |
Collapse
|
11
|
Wang H, Han J, Li H, Duan L, Liu Z, Cheng H. CDA-SKAG: Predicting circRNA-disease associations using similarity kernel fusion and an attention-enhancing graph autoencoder. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:7957-7980. [PMID: 37161181 DOI: 10.3934/mbe.2023345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Circular RNAs (circRNAs) constitute a category of circular non-coding RNA molecules whose abnormal expression is closely associated with the development of diseases. As biological data become abundant, a lot of computational prediction models have been used for circRNA-disease association prediction. However, existing prediction models ignore the non-linear information of circRNAs and diseases when fusing multi-source similarities. In addition, these models fail to take full advantage of the vital feature information of high-similarity neighbor nodes when extracting features of circRNAs or diseases. In this paper, we propose a deep learning model, CDA-SKAG, which introduces a similarity kernel fusion algorithm to integrate multi-source similarity matrices to capture the non-linear information of circRNAs or diseases, and construct a circRNA information space and a disease information space. The model embeds an attention-enhancing layer in the graph autoencoder to enhance the associations between nodes with higher similarity. A cost-sensitive neural network is introduced to address the problem of positive and negative sample imbalance, consequently improving our model's generalization capability. The experimental results show that the prediction performance of our model CDA-SKAG outperformed existing circRNA-disease association prediction models. The results of the case studies on lung and cervical cancer suggest that CDA-SKAG can be utilized as an effective tool to assist in predicting circRNA-disease associations.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Jiale Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Haolin Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Liguo Duan
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Zhihao Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Hao Cheng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
12
|
Wang W, Chen H. Predicting miRNA-disease associations based on lncRNA-miRNA interactions and graph convolution networks. Brief Bioinform 2023; 24:6918743. [PMID: 36526276 DOI: 10.1093/bib/bbac495] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 12/23/2022] Open
Abstract
Increasing studies have proved that microRNAs (miRNAs) are critical biomarkers in the development of human complex diseases. Identifying disease-related miRNAs is beneficial to disease prevention, diagnosis and remedy. Based on the assumption that similar miRNAs tend to associate with similar diseases, various computational methods have been developed to predict novel miRNA-disease associations (MDAs). However, selecting proper features for similarity calculation is a challenging task because of data deficiencies in biomedical science. In this study, we propose a deep learning-based computational method named MAGCN to predict potential MDAs without using any similarity measurements. Our method predicts novel MDAs based on known lncRNA-miRNA interactions via graph convolution networks with multichannel attention mechanism and convolutional neural network combiner. Extensive experiments show that the average area under the receiver operating characteristic values obtained by our method under 2-fold, 5-fold and 10-fold cross-validations are 0.8994, 0.9032 and 0.9044, respectively. When compared with five state-of-the-art methods, MAGCN shows improvement in terms of prediction accuracy. In addition, we conduct case studies on three diseases to discover their related miRNAs, and find that all the top 50 predictions for all the three diseases have been supported by established databases. The comprehensive results demonstrate that our method is a reliable tool in detecting new disease-related miRNAs.
Collapse
|
13
|
Xie GB, Chen RB, Lin ZY, Gu GS, Yu JR, Liu ZG, Cui J, Lin LQ, Chen LC. Predicting lncRNA-disease associations based on combining selective similarity matrix fusion and bidirectional linear neighborhood label propagation. Brief Bioinform 2023; 24:6966536. [PMID: 36592062 DOI: 10.1093/bib/bbac595] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/30/2022] [Accepted: 12/04/2022] [Indexed: 01/03/2023] Open
Abstract
Recent studies have revealed that long noncoding RNAs (lncRNAs) are closely linked to several human diseases, providing new opportunities for their use in detection and therapy. Many graph propagation and similarity fusion approaches can be used for predicting potential lncRNA-disease associations. However, existing similarity fusion approaches suffer from noise and self-similarity loss in the fusion process. To address these problems, a new prediction approach, termed SSMF-BLNP, based on organically combining selective similarity matrix fusion (SSMF) and bidirectional linear neighborhood label propagation (BLNP), is proposed in this paper to predict lncRNA-disease associations. In SSMF, self-similarity networks of lncRNAs and diseases are obtained by selective preprocessing and nonlinear iterative fusion. The fusion process assigns weights to each initial similarity network and introduces a unit matrix that can reduce noise and compensate for the loss of self-similarity. In BLNP, the initial lncRNA-disease associations are employed in both lncRNA and disease directions as label information for linear neighborhood label propagation. The propagation was then performed on the self-similarity network obtained from SSMF to derive the scoring matrix for predicting the relationships between lncRNAs and diseases. Experimental results showed that SSMF-BLNP performed better than seven other state of-the-art approaches. Furthermore, a case study demonstrated up to 100% and 80% accuracy in 10 lncRNAs associated with hepatocellular carcinoma and 10 lncRNAs associated with renal cell carcinoma, respectively. The source code and datasets used in this paper are available at: https://github.com/RuiBingo/SSMF-BLNP.
Collapse
Affiliation(s)
- Guo-Bo Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Rui-Bin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhi-Yi Lin
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guo-Sheng Gu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Jun-Rui Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhen-Guo Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China
| | - Ji Cui
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China
| | - Lie-Qing Lin
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, 510000, China
| | - Lang-Cheng Chen
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, 510000, China
| |
Collapse
|
14
|
Li P, Tiwari P, Xu J, Qian Y, Ai C, Ding Y, Guo F. Sparse regularized joint projection model for identifying associations of non-coding RNAs and human diseases. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
15
|
Chen Y, Wang J, Wang C, Liu M, Zou Q. Deep learning models for disease-associated circRNA prediction: a review. Brief Bioinform 2022; 23:6696465. [PMID: 36130259 DOI: 10.1093/bib/bbac364] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 12/14/2022] Open
Abstract
Emerging evidence indicates that circular RNAs (circRNAs) can provide new insights and potential therapeutic targets for disease diagnosis and treatment. However, traditional biological experiments are expensive and time-consuming. Recently, deep learning with a more powerful ability for representation learning enables it to be a promising technology for predicting disease-associated circRNAs. In this review, we mainly introduce the most popular databases related to circRNA, and summarize three types of deep learning-based circRNA-disease associations prediction methods: feature-generation-based, type-discrimination and hybrid-based methods. We further evaluate seven representative models on benchmark with ground truth for both balance and imbalance classification tasks. In addition, we discuss the advantages and limitations of each type of method and highlight suggested applications for future research.
Collapse
Affiliation(s)
- Yaojia Chen
- College of Electronics and Information Engineering Guangdong Ocean University, Zhanjiang, China and the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Mingxin Liu
- College of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, China
| | - Quan Zou
- University of Electronic Science and Technology of China, China
| |
Collapse
|
16
|
Li W, Wang S, Xu J, Xiang J. Inferring Latent MicroRNA-Disease Associations on a Gene-Mediated Tripartite Heterogeneous Multiplexing Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3190-3201. [PMID: 35041612 DOI: 10.1109/tcbb.2022.3143770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules encoded by endogenous genes with a length of about 22 nucleotides. MiRNAs have been successfully identified as differentially expressed in various cancers. There is evidence that disorders of miRNAs are associated with a variety of complex diseases. Therefore, inferring potential miRNA-disease associations (MDAs) is very important for understanding the aetiology and pathogenesis of many diseases and is useful to disease diagnosis, prognosis and treatment. First, We creatively fused multiple similarity subnetworks from multi-sources for miRNAs, genes and diseases by multiplexing technology, respectively. Then, three multiplexed biological subnetworks are connected through the extended binary association to form a tripartite complete heterogeneous multiplexed network (Tri-HM). Finally, because the constructed Tri-HM network can retain subnetworks' original topology and biological functions and expands the binary association and dependence between the three biological entities, rich neighbourhood information is obtained iteratively from neighbours by a non-equilibrium random walk. Through cross-validation, our tri-HM-RWR model obtained an AUC value of 0.8657, and an AUPR value of 0.2139 in the global 5-fold cross-validation, which shows that our model can more fully speculate disease-related miRNAs.
Collapse
|
17
|
MHDMF: Prediction of miRNA-disease associations based on Deep Matrix Factorization with Multi-source Graph Convolutional Network. Comput Biol Med 2022; 149:106069. [PMID: 36115300 DOI: 10.1016/j.compbiomed.2022.106069] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/31/2022] [Accepted: 08/27/2022] [Indexed: 11/24/2022]
Abstract
A growing number of works have proved that microRNAs (miRNAs) are a crucial biomarker in diverse bioprocesses affecting various diseases. As a good complement to high-cost wet experiment-based methods, numerous computational prediction methods have sprung up. However, there are still challenges that exist in making effective use of high false-negative associations and multi-source information for finding the potential associations. In this work, we develop an end-to-end computational framework, called MHDMF, which integrates the multi-source information on a heterogeneous network to discover latent disease-miRNA associations. Since high false-negative exist in the miRNA-disease associations, MHDMF utilizes the multi-source Graph Convolutional Network (GCN) to correct the false-negative association by reformulating the miRNA-disease association score matrix. The score matrix reformulation is based on different similarity profiles and known associations between miRNAs, genes, and diseases. Then, MHDMF employs Deep Matrix Factorization (DMF) to predict the miRNA-disease associations based on reformulated miRNA-disease association score matrix. The experimental results show that the proposed framework outperforms highly related comparison methods by a large margin on tasks of miRNA-disease association prediction. Furthermore, case studies suggest that MHDMF could be a convenient and efficient tool and may supply a new way to think about miRNA-disease association prediction.
Collapse
|
18
|
A message passing framework with multiple data integration for miRNA-disease association prediction. Sci Rep 2022; 12:16259. [PMID: 36171337 PMCID: PMC9519928 DOI: 10.1038/s41598-022-20529-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/14/2022] [Indexed: 11/08/2022] Open
Abstract
Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach's superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.
Collapse
|
19
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
20
|
Ma M, Na S, Zhang X, Chen C, Xu J. SFGAE: a self-feature-based graph autoencoder model for miRNA-disease associations prediction. Brief Bioinform 2022; 23:6678419. [PMID: 36037084 DOI: 10.1093/bib/bbac340] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 07/21/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Increasing evidence has suggested that microRNAs (miRNAs) are important biomarkers of various diseases. Numerous graph neural network (GNN) models have been proposed for predicting miRNA-disease associations. However, the existing GNN-based methods have over-smoothing issue-the learned feature embeddings of miRNA nodes and disease nodes are indistinguishable when stacking multiple GNN layers. This issue makes the performance of the methods sensitive to the number of layers, and significantly hurts the performance when more layers are employed. In this study, we resolve this issue by a novel self-feature-based graph autoencoder model, shortened as SFGAE. The key novelty of SFGAE is to construct miRNA-self embeddings and disease-self embeddings, and let them be independent of graph interactions between two types of nodes. The novel self-feature embeddings enrich the information of typical aggregated feature embeddings, which aggregate the information from direct neighbors and hence heavily rely on graph interactions. SFGAE adopts a graph encoder with attention mechanism to concatenate aggregated feature embeddings and self-feature embeddings, and adopts a bilinear decoder to predict links. Our experiments show that SFGAE achieves state-of-the-art performance. In particular, SFGAE improves the average AUC upon recent GAEMDA [1] on the benchmark datasets HMDD v2.0 and HMDD v3.2, and consistently performs better when less (e.g. 10%) training samples are used. Furthermore, SFGAE effectively overcomes the over-smoothing issue and performs stably well on deeper models (e.g. eight layers). Finally, we carry out case studies on three human diseases, colon neoplasms, esophageal neoplasms and kidney neoplasms, and perform a survival analysis using kidney neoplasm as an example. The results suggest that SFGAE is a reliable tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Mingyuan Ma
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Sen Na
- International Computer Science Institute and Department of Statistics, University of California, Berkeley, Berkeley CA, USA
| | - Xiaolu Zhang
- Department of Information Systems, City University of Hong Kong, Hong Kong, China
| | - Congzhou Chen
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Jin Xu
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| |
Collapse
|
21
|
Ai C, Yang H, Ding Y, Tang J, Guo F. A multi-layer multi-kernel neural network for determining associations between non-coding RNAs and diseases. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
22
|
Yu L, Zheng Y, Ju B, Ao C, Gao L. Research progress of miRNA-disease association prediction and comparison of related algorithms. Brief Bioinform 2022; 23:6542222. [PMID: 35246678 DOI: 10.1093/bib/bbac066] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/30/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
With an in-depth understanding of noncoding ribonucleic acid (RNA), many studies have shown that microRNA (miRNA) plays an important role in human diseases. Because traditional biological experiments are time-consuming and laborious, new calculation methods have recently been developed to predict associations between miRNA and diseases. In this review, we collected various miRNA-disease association prediction models proposed in recent years and used two common data sets to evaluate the performance of the prediction models. First, we systematically summarized the commonly used databases and similarity data for predicting miRNA-disease associations, and then divided the various calculation models into four categories for summary and detailed introduction. In this study, two independent datasets (D5430 and D6088) were compiled to systematically evaluate 11 publicly available prediction tools for miRNA-disease associations. The experimental results indicate that the methods based on information dissemination and the method based on scoring function require shorter running time. The method based on matrix transformation often requires a longer running time, but the overall prediction result is better than the previous two methods. We hope that the summary of work related to miRNA and disease will provide comprehensive knowledge for predicting the relationship between miRNA and disease and contribute to advanced computation tools in the future.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yujia Zheng
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Bingyi Ju
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
23
|
Yang H, Ding Y, Tang J, Guo F. Inferring human microbe–drug associations via multiple kernel fusion on graph neural network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
24
|
Wang YT, Li L, Ji CM, Zheng CH, Ni JC. ILPMDA: Predicting miRNA-Disease Association Based on Improved Label Propagation. Front Genet 2021; 12:743665. [PMID: 34659364 PMCID: PMC8514753 DOI: 10.3389/fgene.2021.743665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/30/2021] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have been demonstrated to be related to numerous complex human diseases. Considerable studies have suggested that miRNAs affect many complicated bioprocesses. Hence, the investigation of disease-related miRNAs by utilizing computational methods is warranted. In this study, we presented an improved label propagation for miRNA-disease association prediction (ILPMDA) method to observe disease-related miRNAs. First, we utilized similarity kernel fusion to integrate different types of biological information for generating miRNA and disease similarity networks. Second, we applied the weighted k-nearest known neighbor algorithm to update verified miRNA-disease association data. Third, we utilized improved label propagation in disease and miRNA similarity networks to make association prediction. Furthermore, we obtained final prediction scores by adopting an average ensemble method to integrate the two kinds of prediction results. To evaluate the prediction performance of ILPMDA, two types of cross-validation methods and case studies on three significant human diseases were implemented to determine the accuracy and effectiveness of ILPMDA. All results demonstrated that ILPMDA had the ability to discover potential miRNA-disease associations.
Collapse
Affiliation(s)
- Yu-Tian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Lei Li
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Cun-Mei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jian-Cheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| |
Collapse
|
25
|
Li J, Liu T, Wang J, Li Q, Ning C, Yang Y. MvKFN-MDA: Multi-view Kernel Fusion Network for miRNA-disease association prediction. Artif Intell Med 2021; 118:102115. [PMID: 34412838 DOI: 10.1016/j.artmed.2021.102115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 05/13/2021] [Accepted: 05/21/2021] [Indexed: 12/01/2022]
Abstract
Predicting the associations between microRNAs (miRNAs) and diseases is of great significance for identifying miRNAs related to human diseases. Since it is time-consuming and costly to identify the association between miRNA and disease through biological experiments, computational methods are currently used as an effective supplement to identify the potential association between disease and miRNA. This paper presents a Multi-view Kernel Fusion Network (MvKFN) based prediction method (MvKFN-MDA) to address the problem of miRNA-disease associations prediction. A novel multiple kernel fusion framework Multi-view Kernel Fusion Network (MvKFN) is first proposed to effectively fuse different views similarity kernels constructed from different data sources in a highly nonlinear way. Using MvKFNs, both different base similarity kernels for miRNA, such as sequence, functional, semantic, Gaussian profile kernels and different base similarity kernels for diseases, such as semantic, Gaussian profile kernel are nonlinearly fused into two integrated similarity kernels, one for miRNA, another for disease. Then, miRNA and disease feature representations are extracted from the miRNA and disease integrated similarity kernels respectively. These features are then fed into a neural matrix completion framework which finally outputs the association prediction scores. The parameters of MvKFN-MDA are learned based on the known miRNA-disease association matrix in a supervised end-to-end way. We compare the proposed method with other state-of-the-art methods. The AUCs of our proposed method were superior to the existing methods in both 5-FCV and LOOCV on two open experimental datasets. Furthermore, 49, 48, and 47 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma, and kidney cancer, are verified respectively using experimental literature. Finally, 100% accuracy from the top 50 predicted miRNAs is achieved when breast cancer is used as a case study to evaluate the ability of MvKFN-MDA for predicting a new disease without any known related miRNAs.
Collapse
Affiliation(s)
- Jin Li
- School of Software, Yunnan University, Kunming, China; Kunming Key Laboratory of Data Science and Intelligent Computing, Kunming, China
| | - Tao Liu
- School of Software, Yunnan University, Kunming, China
| | - Jingru Wang
- School of Software, Yunnan University, Kunming, China
| | - Qing Li
- First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Chenxi Ning
- School of Software, Yunnan University, Kunming, China
| | - Yun Yang
- School of Software, Yunnan University, Kunming, China; Kunming Key Laboratory of Data Science and Intelligent Computing, Kunming, China.
| |
Collapse
|
26
|
Xie G, Chen H, Sun Y, Gu G, Lin Z, Wang W, Li J. Predicting circRNA-Disease Associations Based on Deep Matrix Factorization with Multi-source Fusion. Interdiscip Sci 2021; 13:582-594. [PMID: 34185304 DOI: 10.1007/s12539-021-00455-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/18/2021] [Accepted: 06/20/2021] [Indexed: 12/14/2022]
Abstract
Recently, circRNAs with covalently closed loops have been discovered to play important parts in the progression of diseases. Nevertheless, the study of circRNA-disease associations is highly dependent on biological experiments, which are time-consuming and expensive. Hence, a computational approach to predict circRNA-disease associations is urgently needed. In this paper, we presented an approach that is based on deep matrix factorization with multi-source fusion (DMFMSF). In DMFMSF, several useful circRNA and disease similarities were selected and then combined by similarity kernel fusion. Then, linear and non-linear characteristics were mined using singular value decomposition (SVD) and deep matrix factorization to infer potential circRNA-disease associations. Performance of the proposed DMFMSF on two benchmark datasets are rigorously validated by leave-one-out cross-validation(LOOCV) and fivefold cross-validation (5-fold CV). The experimental results showed that DMFMSF is superior over several existing computational approaches. In addition, five important diseases, hepatocellular carcinoma, breast cancer, acute myeloid leukemia, colorectal cancer, and coronary artery disease were applied in case studies. The results suggest that DMFMSF can be used as an accurate and efficient computational tool for predicting circRNA-disease associations.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Hui Chen
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Yuping Sun
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China.
| | - Guosheng Gu
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Zhiyi Lin
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Weiming Wang
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China.,School of Science and Technology, The Open University of Hong Kong, Hong Kong, 999077, China
| | - Jianming Li
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| |
Collapse
|
27
|
Qian Y, Jiang L, Ding Y, Tang J, Guo F. A sequence-based multiple kernel model for identifying DNA-binding proteins. BMC Bioinformatics 2021; 22:291. [PMID: 34058979 PMCID: PMC8167993 DOI: 10.1186/s12859-020-03875-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/13/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND DNA-Binding Proteins (DBP) plays a pivotal role in biological system. A mounting number of researchers are studying the mechanism and detection methods. To detect DBP, the tradition experimental method is time-consuming and resource-consuming. In recent years, Machine Learning methods have been used to detect DBP. However, it is difficult to adequately describe the information of proteins in predicting DNA-binding proteins. In this study, we extract six features from protein sequence and use Multiple Kernel Learning-based on Centered Kernel Alignment to integrate these features. The integrated feature is fed into Support Vector Machine to build predictive model and detect new DBP. RESULTS In our work, date sets of PDB1075 and PDB186 are employed to test our method. From the results, our model obtains better results (accuracy) than other existing methods on PDB1075 ([Formula: see text]) and PDB186 ([Formula: see text]), respectively. CONCLUSION Multiple kernel learning could fuse the complementary information between different features. Compared with existing methods, our method achieves comparable and best results on benchmark data sets.
Collapse
Affiliation(s)
- Yuqing Qian
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, People's Republic of China
| | - Limin Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, People's Republic of China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, People's Republic of China.
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, People's Republic of China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, People's Republic of China.
| |
Collapse
|
28
|
Tang X, Luo J, Shen C, Lai Z. Multi-view Multichannel Attention Graph Convolutional Network for miRNA-disease association prediction. Brief Bioinform 2021; 22:6271996. [PMID: 33963829 DOI: 10.1093/bib/bbab174] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION In recent years, a growing number of studies have proved that microRNAs (miRNAs) play significant roles in the development of human complex diseases. Discovering the associations between miRNAs and diseases has become an important part of the discovery and treatment of disease. Since uncovering associations via traditional experimental methods is complicated and time-consuming, many computational methods have been proposed to identify the potential associations. However, there are still challenges in accurately determining potential associations between miRNA and disease by using multisource data. RESULTS In this study, we develop a Multi-view Multichannel Attention Graph Convolutional Network (MMGCN) to predict potential miRNA-disease associations. Different from simple multisource information integration, MMGCN employs GCN encoder to obtain the features of miRNA and disease in different similarity views, respectively. Moreover, our MMGCN can enhance the learned latent representations for association prediction by utilizing multichannel attention, which adaptively learns the importance of different features. Empirical results on two datasets demonstrate that MMGCN model can achieve superior performance compared with nine state-of-the-art methods on most of the metrics. Furthermore, we prove the effectiveness of multichannel attention mechanism and the validity of multisource data in miRNA and disease association prediction. Case studies also indicate the ability of the method for discovering new associations.
Collapse
Affiliation(s)
- Xinru Tang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Zihan Lai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| |
Collapse
|
29
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
30
|
Chen Z, Shen Z, Zhang Z, Zhao D, Xu L, Zhang L. RNA-Associated Co-expression Network Identifies Novel Biomarkers for Digestive System Cancer. Front Genet 2021; 12:659788. [PMID: 33841514 PMCID: PMC8033200 DOI: 10.3389/fgene.2021.659788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 02/25/2021] [Indexed: 01/04/2023] Open
Abstract
Cancers of the digestive system are malignant diseases. Our study focused on colon cancer, esophageal cancer (ESCC), rectal cancer, gastric cancer (GC), and rectosigmoid junction cancer to identify possible biomarkers for these diseases. The transcriptome data were downloaded from the TCGA database (The Cancer Genome Atlas Program), and a network was constructed using the WGCNA algorithm. Two significant modules were found, and coexpression networks were constructed. CytoHubba was used to identify hub genes of the two networks. GO analysis suggested that the network genes were involved in metabolic processes, biological regulation, and membrane and protein binding. KEGG analysis indicated that the significant pathways were the calcium signaling pathway, fatty acid biosynthesis, and pathways in cancer and insulin resistance. Some of the most significant hub genes were hsa-let-7b-3p, hsa-miR-378a-5p, hsa-miR-26a-5p, hsa-miR-382-5p, and hsa-miR-29b-2-5p and SECISBP2 L, NCOA1, HERC1, HIPK3, and MBNL1, respectively. These genes were predicted to be associated with the tumor prognostic reference for this patient population.
Collapse
Affiliation(s)
- Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zijie Shen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
31
|
Gao MM, Cui Z, Gao YL, Wang J, Liu JX. Multi-Label Fusion Collaborative Matrix Factorization for Predicting LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:881-890. [PMID: 32324583 DOI: 10.1109/jbhi.2020.2988720] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As we all know, science and technology are developing faster and faster. Many experts and scholars have demonstrated that human diseases are related to lncRNA, but only a few associations have been confirmed, and many unknown associations need to be found. In the process of finding associations, it takes a lot of time, so finding an efficient way to predict the associations between lncRNAs and diseases is particularly important. In this paper, we propose a multi-label fusion collaborative matrix factorization (MLFCMF) approach for predicting lncRNA-disease associations (LDAs). Firstly, the lncRNA space and disease space are optimized by multi-label to enhance the intrinsic link between lncRNA and disease and to tap potential information. Multi-label learning can encode a variety of data information from the sample space. Secondly, to learn multi-label information in the data space, the fusion method is used to handle the relationship between multiple labels. More comprehensive information will be obtained by weighing the effects of different labels. The addition of Gaussian interaction profile (GIP) kernel can increase the network similarity. Finally, the lncRNA-disease associations are predicted by the method of collaborative matrix factorization. The ten-fold cross-validation method is used to evaluate the MLFCMF method, and our method finally obtains an AUC value of 0.8612. Detailed analysis of ovarian cancer, colorectal cancer, and lung cancer in the simulation experiment results. So it can be seen that our method MLFCMF is an effective model for predicting lncRNA-disease associations.
Collapse
|
32
|
Xu D, Xu H, Zhang Y, Wang M, Chen W, Gao R. MDAKRLS: Predicting human microbe-disease association based on Kronecker regularized least squares and similarities. J Transl Med 2021; 19:66. [PMID: 33579301 PMCID: PMC7881563 DOI: 10.1186/s12967-021-02732-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/01/2021] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Microbes are closely related to human health and diseases. Identification of disease-related microbes is of great significance for revealing the pathological mechanism of human diseases and understanding the interaction mechanisms between microbes and humans, which is also useful for the prevention, diagnosis and treatment of human diseases. Considering the known disease-related microbes are still insufficient, it is necessary to develop effective computational methods and reduce the time and cost of biological experiments. METHODS In this work, we developed a novel computational method called MDAKRLS to discover potential microbe-disease associations (MDAs) based on the Kronecker regularized least squares. Specifically, we introduced the Hamming interaction profile similarity to measure the similarities of microbes and diseases besides Gaussian interaction profile kernel similarity. In addition, we introduced the Kronecker product to construct two kinds of Kronecker similarities between microbe-disease pairs. Then, we designed the Kronecker regularized least squares with different Kronecker similarities to obtain prediction scores, respectively, and calculated the final prediction scores by integrating the contributions of different similarities. RESULTS The AUCs value of global leave-one-out cross-validation and 5-fold cross-validation achieved by MDAKRLS were 0.9327 and 0.9023 ± 0.0015, which were significantly higher than five state-of-the-art methods used for comparison. Comparison results demonstrate that MDAKRLS has faster computing speed under two kinds of frameworks. In addition, case studies of inflammatory bowel disease (IBD) and asthma further showed 19 (IBD), 19 (asthma) of the top 20 prediction disease-related microbes could be verified by previously published biological or medical literature. CONCLUSIONS All the evaluation results adequately demonstrated that MDAKRLS has an effective and reliable prediction performance. It may be a useful tool to seek disease-related new microbes and help biomedical researchers to carry out follow-up studies.
Collapse
Affiliation(s)
- Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Hanxiao Xu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| | - Mingyi Wang
- Department of Central Lab, Weihai Municipal Hospital, Cheeloo College of Medicine, Shandong University, Weihai, Shandong, China.
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| |
Collapse
|
33
|
Ding Y, Lei X, Liao B, Wu FX. Machine learning approaches for predicting biomolecule-disease associations. Brief Funct Genomics 2021; 20:273-287. [PMID: 33554238 DOI: 10.1093/bfgp/elab002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Biomolecules, such as microRNAs, circRNAs, lncRNAs and genes, are functionally interdependent in human cells, and all play critical roles in diverse fundamental and vital biological processes. The dysregulations of such biomolecules can cause diseases. Identifying the associations between biomolecules and diseases can uncover the mechanisms of complex diseases, which is conducive to their diagnosis, treatment, prognosis and prevention. Due to the time consumption and cost of biologically experimental methods, many computational association prediction methods have been proposed in the past few years. In this study, we provide a comprehensive review of machine learning-based approaches for predicting disease-biomolecule associations with multi-view data sources. Firstly, we introduce some databases and general strategies for integrating multi-view data sources in the prediction models. Then we discuss several feature representation methods for machine learning-based prediction models. Thirdly, we comprehensively review machine learning-based prediction approaches in three categories: basic machine learning methods, matrix completion-based methods and deep learning-based methods, while discussing their advantages and disadvantages. Finally, we provide some perspectives for further improving biomolecule-disease prediction methods.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering at the University of Saskatchewan
| | - Xiujuan Lei
- School of Computer Science at Shaanxi Normal University
| | - Bo Liao
- School of Mathematics and Statistics at Hainan Normal University, Haikou, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan
| |
Collapse
|
34
|
Wang H, Tang J, Ding Y, Guo F. Exploring associations of non-coding RNAs in human diseases via three-matrix factorization with hypergraph-regular terms on center kernel alignment. Brief Bioinform 2021; 22:6095847. [PMID: 33443536 DOI: 10.1093/bib/bbaa409] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 11/05/2020] [Accepted: 12/11/2020] [Indexed: 12/25/2022] Open
Abstract
Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA-disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA-disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of $0.9832$, $0.9775$, $0.9023$, $0.8809$ and $0.9185$ via 5-fold cross-validation and $0.9832$, $0.9836$, $0.9198$, $0.9459$ and $0.9275$ via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact: fguo@tju.edu.cn.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
35
|
Zhang Z, Ding J, Xu J, Tang J, Guo F. Multi-Scale Time-Series Kernel-Based Learning Method for Brain Disease Diagnosis. IEEE J Biomed Health Inform 2021; 25:209-217. [PMID: 32248130 DOI: 10.1109/jbhi.2020.2983456] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The functional magnetic resonance imaging (fMRI) is a noninvasive technique for studying brain activity, such as brain network analysis, neural disease automated diagnosis and so on. However, many existing methods have some drawbacks, such as limitations of graph theory, lack of global topology characteristic, local sensitivity of functional connectivity, and absence of temporal or context information. In addition to many numerical features, fMRI time series data also cover specific contextual knowledge and global fluctuation information. Here, we propose multi-scale time-series kernel-based learning model for brain disease diagnosis, based on Jensen-Shannon divergence. First, we calculate correlation value within and between brain regions over time. In addition, we extract multi-scale synergy expression probability distribution (interactional relation) between brain regions. Also, we produce state transition probability distribution (sequential relation) on single brain regions. Then, we build time-series kernel-based learning model based on Jensen-Shannon divergence to measure similarity of brain functional connectivity. Finally, we provide an efficient system to deal with brain network analysis and neural disease automated diagnosis. On Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, our proposed method achieves accuracy of 0.8994 and AUC of 0.8623. On Major Depressive Disorder (MDD) dataset, our proposed method achieves accuracy of 0.9166 and AUC of 0.9263. Experiments show that our proposed method outperforms other existing excellent neural disease automated diagnosis approaches. It shows that our novel prediction method performs great accurate for identification of brain diseases as well as existing outstanding prediction tools.
Collapse
|
36
|
Zhou YK, Hu J, Shen ZA, Zhang WY, Du PF. LPI-SKF: Predicting lncRNA-Protein Interactions Using Similarity Kernel Fusions. Front Genet 2020; 11:615144. [PMID: 33362868 PMCID: PMC7758075 DOI: 10.3389/fgene.2020.615144] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 11/16/2020] [Indexed: 01/24/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).
Collapse
Affiliation(s)
| | | | | | | | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
37
|
Ao C, Zhou W, Gao L, Dong B, Yu L. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics 2020; 112:4666-4674. [DOI: 10.1016/j.ygeno.2020.08.016] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 12/19/2022]
|
38
|
Fan C, Lei X, Pan Y. Prioritizing CircRNA-Disease Associations With Convolutional Neural Network Based on Multiple Similarity Feature Fusion. Front Genet 2020; 11:540751. [PMID: 33193615 PMCID: PMC7525185 DOI: 10.3389/fgene.2020.540751] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 08/12/2020] [Indexed: 12/15/2022] Open
Abstract
Accumulating evidence shows that circular RNAs (circRNAs) have significant roles in human health and in the occurrence and development of diseases. Biological researchers have identified disease-related circRNAs that could be considered as potential biomarkers for clinical diagnosis, prognosis, and treatment. However, identification of circRNA–disease associations using traditional biological experiments is still expensive and time-consuming. In this study, we propose a novel method named MSFCNN for the task of circRNA–disease association prediction, involving two-layer convolutional neural networks on a feature matrix that fuses multiple similarity kernels and interaction features among circRNAs, miRNAs, and diseases. First, four circRNA similarity kernels and seven disease similarity kernels are constructed based on the biological or topological properties of circRNAs and diseases. Subsequently, the similarity kernel fusion method is used to integrate the similarity kernels into one circRNA similarity kernel and one disease similarity kernel, respectively. Then, a feature matrix for each circRNA–disease pair is constructed by integrating the fused circRNA similarity kernel and fused disease similarity kernel with interactions and features among circRNAs, miRNAs, and diseases. The features of circRNA–miRNA and disease–miRNA interactions are selected using principal component analysis. Finally, taking the constructed feature matrix as an input, we used two-layer convolutional neural networks to predict circRNA–disease association labels and mine potential novel associations. Five-fold cross validation shows that our proposed model outperforms conventional machine learning methods, including support vector machine, random forest, and multilayer perception approaches. Furthermore, case studies of predicted circRNAs for specific diseases and the top predicted circRNA–disease associations are analyzed. The results show that the MSFCNN model could be an effective tool for mining potential circRNA–disease associations.
Collapse
Affiliation(s)
- Chunyan Fan
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA, United States
| |
Collapse
|
39
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
40
|
Fan Y, Chen M, Zhu Q, Wang W. Inferring Disease-Associated Microbes Based on Multi-Data Integration and Network Consistency Projection. Front Bioeng Biotechnol 2020; 8:831. [PMID: 32850711 PMCID: PMC7418576 DOI: 10.3389/fbioe.2020.00831] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 06/29/2020] [Indexed: 12/18/2022] Open
Abstract
Plenty of microbes in our human body play a vital role in the process of cell physiology. In recent years, there is accumulating evidence indicating that microbes are closely related to many complex human diseases. In-depth investigation of disease-associated microbes can contribute to understanding the pathogenesis of diseases and thus provide novel strategies for the treatment, diagnosis, and prevention of diseases. To date, many computational models have been proposed for predicting microbe-disease associations using available similarity networks. However, these similarity networks are not effectively fused. In this study, we proposed a novel computational model based on multi-data integration and network consistency projection for Human Microbe-Disease Associations Prediction (HMDA-Pred), which fuses multiple similarity networks by a linear network fusion method. HMDA-Pred yielded AUC values of 0.9589 and 0.9361 ± 0.0037 in the experiments of leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, in case studies, 10, 8, and 10 out of the top 10 predicted microbes of asthma, colon cancer, and inflammatory bowel disease were confirmed by the literatures, respectively.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | | | | | | |
Collapse
|
41
|
Huang YA, Chan KCC, You ZH, Hu P, Wang L, Huang ZA. Predicting microRNA-disease associations from lncRNA-microRNA interactions via Multiview Multitask Learning. Brief Bioinform 2020; 22:5868072. [PMID: 32633319 DOI: 10.1093/bib/bbaa133] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 05/26/2020] [Accepted: 06/01/2020] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Identifying microRNAs that are associated with different diseases as biomarkers is a problem of great medical significance. Existing computational methods for uncovering such microRNA-diseases associations (MDAs) are mostly developed under the assumption that similar microRNAs tend to associate with similar diseases. Since such an assumption is not always valid, these methods may not always be applicable to all kinds of MDAs. Considering that the relationship between long noncoding RNA (lncRNA) and different diseases and the co-regulation relationships between the biological functions of lncRNA and microRNA have been established, we propose here a multiview multitask method to make use of the known lncRNA-microRNA interaction to predict MDAs on a large scale. The investigation is performed in the absence of complete information of microRNAs and any similarity measurement for it and to the best knowledge, the work represents the first ever attempt to discover MDAs based on lncRNA-microRNA interactions. RESULTS In this paper, we propose to develop a deep learning model called MVMTMDA that can create a multiview representation of microRNAs. The model is trained based on an end-to-end multitasking approach to machine learning so that, based on it, missing data in the side information can be determined automatically. Experimental results show that the proposed model yields an average area under ROC curve of 0.8410+/-0.018, 0.8512+/-0.012 and 0.8521+/-0.008 when k is set to 2, 5 and 10, respectively. In addition, we also propose here a statistical approach to predicting lncRNA-disease associations based on these associations and the MDA discovered using MVMTMDA. AVAILABILITY Python code and the datasets used in our studies are made available at https://github.com/yahuang1991polyu/MVMTMDA/.
Collapse
Affiliation(s)
- Yu-An Huang
- Department of Computing at the Hong Kong Polytechnic University
| | - Keith C C Chan
- Systems Design Engineering from the University of Waterloo, Canada
| | | | - Pengwei Hu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong
| | - Lei Wang
- China University of Mining and Technology
| | | |
Collapse
|
42
|
Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]
Abstract
Fungi play essential roles in many ecological processes, and taxonomic classification is fundamental for microbial community characterization and vital for the study and preservation of fungal biodiversity. To cope with massive fungal barcode data, tools that can implement extensive volumes of barcode sequences, especially the internal transcribed spacer (ITS) region, are necessary. However, high variation in the ITS region and computational requirements for processing high-dimensional features remain challenging for existing predictors. In this study, we developed Its2vec, a bioinformatics tool for the classification of fungal ITS barcodes to the species level. An ITS database covering more than 25,000 species in a broad range of fungal taxa was assembled. For dimensionality reduction, a word embedding algorithm was used to represent an ITS sequence as a dense low-dimensional vector. A random forest-based classifier was built for species identification. Benchmarking results showed that our model achieved an accuracy comparable to that of several state-of-the-art predictors, and more importantly, it could implement large datasets and greatly reduce dimensionality. We expect the Its2vec model to be helpful for fungal species identification and, thus, for revealing microbial community structures and in deepening our understanding of their functional mechanisms.
Collapse
|
43
|
Yuan L, Guo F, Wang L, Zou Q. Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief Funct Genomics 2020; 18:412-418. [PMID: 31204784 DOI: 10.1093/bfgp/elz010] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 02/22/2019] [Accepted: 04/26/2019] [Indexed: 02/01/2023] Open
Abstract
Tumor metastasis is the key reason for the high mortality rate of tumor. Growing number of scholars have begun to pay attention to the research on tumor metastasis and have achieved satisfactory results in this field. The advent of the era of sequencing has enabled us to study cancer metastasis at the molecular level, which is essential for understanding the molecular mechanism of metastasis, identifying diagnostic markers and therapeutic targets and guiding clinical decision-making. We reviewed the metastasis-related studies using sequencing data, covering detection of metastasis origin sites, determination of metastasis potential and identification of distal metastasis sites. These findings include the discovery of relevant markers and the presentation of prediction tools. Finally, we discussed the challenge of studying metastasis considering the difficulty of obtaining metastatic cancer data, the complexity of tumor heterogeneity and the uncertainty of sample labels.
Collapse
Affiliation(s)
- Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
44
|
Zhang Y, Chen M, Cheng X, Wei H. MSFSP: A Novel miRNA-Disease Association Prediction Model by Federating Multiple-Similarities Fusion and Space Projection. Front Genet 2020; 11:389. [PMID: 32425980 PMCID: PMC7204399 DOI: 10.3389/fgene.2020.00389] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 03/27/2020] [Indexed: 12/11/2022] Open
Abstract
Growing evidences have indicated that microRNAs (miRNAs) play a significant role relating to many important bioprocesses; their mutations and disorders will cause the occurrence of various complex diseases. The prediction of miRNAs associated with underlying diseases via computational approaches is beneficial to identify biomarkers and discover specific medicine, which can greatly reduce the cost of diagnosis, cure, prognosis, and prevention of human diseases. However, how to further achieve a more reliable prediction of potential miRNA-disease associations with effective integration of different biological data is a challenge for researchers. In this study, we proposed a computational model by using a federated method of combined multiple-similarities fusion and space projection (MSFSP). MSFSP firstly fused the integrated disease similarity (composed of disease semantic similarity, disease functional similarity, and disease Hamming similarity) with the integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity, and miRNA Hamming similarity). Secondly, it constructed the weighted network of miRNA-disease associations from the experimentally verified Boolean network of miRNA-disease associations by using similarity networks. Finally, it calculated the prediction results by weighting miRNA space projection scores and the disease space projection scores. Leave-one-out cross-validation demonstrated that MSFSP has the distinguished predictive accuracy with area under the receiver operating characteristics curve (AUC) of 0.9613 better than that of five other existing models. In case studies, the predictive ability of MSFSP was further confirmed as 96 and 98% of the top 50 predictions for prostatic neoplasms and lung neoplasms were successfully validated by experimental evidences and supporting experimental evidences were also found for 100% of the top 50 predictions for isolated diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Hanyan Wei
- School of Pharmacy, Guilin Medical University, Guilin, China
| |
Collapse
|
45
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
46
|
Sun S, Wang C, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics 2019; 19:40-48. [DOI: 10.1093/bfgp/elz036] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 01/16/2023] Open
Abstract
Abstract
The advent of high-throughput genomic technologies has resulted in the accumulation of massive amounts of genomic information. However, biologists are challenged with how to effectively analyze these data. Machine learning can provide tools for better and more efficient data analysis. Unfortunately, because many plant biologists are unfamiliar with machine learning, its application in plant molecular studies has been restricted to a few species and a limited set of algorithms. Thus, in this study, we provide the basic steps for developing machine learning frameworks and present a comprehensive overview of machine learning algorithms and various evaluation metrics. Furthermore, we introduce sources of important curated plant genomic data and R packages to enable plant biologists to easily and quickly apply appropriate machine learning algorithms in their research. Finally, we discuss current applications of machine learning algorithms for identifying various genes related to resistance to biotic and abiotic stress. Broad application of machine learning and the accumulation of plant sequencing data will advance plant molecular studies.
Collapse
Affiliation(s)
- Shanwen Sun
- University of Bayreuth in Germany. He is now a postdoctoral fellow at the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| | - Chunyu Wang
- Harbin Institute of Technology in China. He is an associate professor in the School of Computer Science and Technology, Harbin Institute of Technology
| | - Hui Ding
- Inner Mongolia University in China. She is an associate professor in the Center for Informational Biology, University of Electronic Science and Technology of China
| | - Quan Zou
- Harbin Institute of Technology in China. He is a professor in the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
47
|
Zhao T, Wang D, Hu Y, Zhang N, Zang T, Wang Y. Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering. Curr Gene Ther 2019; 19:216-223. [DOI: 10.2174/1566523219666190924113737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/05/2019] [Accepted: 06/12/2019] [Indexed: 01/14/2023]
Abstract
Background:
More and more scholars are trying to use it as a specific biomarker for Alzheimer’s
Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that
miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early
events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of
AD, and may also be involved in the disease through some specific molecular mechanisms.
Objective:
Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early
diagnosis.
Materials and Methods:
We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein
interaction network is used to find more AD-related genes by known AD-related genes. Then,
each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each
miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not
generate negative samples randomly with using classification method to identify AD-related miRNAs.
Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers
and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers).
Results and Conclusion:
We identified 257 novel AD-related miRNAs and compare our method with
SVM which is applied by generating negative samples. The AUC of our method is much higher than
SVM and we did case studies to prove that our results are reliable.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yang Hu
- School of life Science and Tenchnology, Harbin Institute of Technology, Harbin, China
| | - Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
48
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
49
|
Wang C, Guo J, Zhao N, Liu Y, Liu X, Liu G, Guo M. A Cancer Survival Prediction Method Based on Graph Convolutional Network. IEEE Trans Nanobioscience 2019; 19:117-126. [PMID: 31443039 DOI: 10.1109/tnb.2019.2936398] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Cancer, as the most challenging part in the human disease history, has always been one of the main threats to human life and health. The high mortality of cancer is largely due to the complexity of cancer and the significant differences in clinical outcomes. Therefore, it will be significant to improve accuracy of cancer survival prediction, which has become one of the main fields of cancer research. Many calculation models for cancer survival prediction have been proposed at present, but most of them generate prediction models only by using single genomic data or clinical data. Multiple genomic data and clinical data have not been integrated yet to take a comprehensive consideration of cancers and predict their survival. METHOD In order to effectively integrate multiple genomic data (including genetic expression, copy number alteration, DNA methylation and exon expression) and clinical data and apply them to predictive studies on cancer survival, similar network fusion algorithm (SNF) was proposed in this paper to integrate multiple genomic data and clinical data so as to generate sample similarity matrix, min-redundancy and max-relevance algorithm (mRMR) was used to conduct feature selection of multiple genomic data and clinical data of cancer samples and generate sample feature matrix, and finally two matrixes were used for semi-supervised training through graph convolutional network (GCN) so as to obtain a cancer survival prediction method integrating multiple genomic data and clinical data based on graph convolutional network (GCGCN). RESULT Performance indexes of GCGCN model indicate that both multiple genomic data and clinical data play significant roles in the accurate survival time prediction of cancer patients. It is compared with existing survival prediction methods, and results show that cancer survival prediction method GCGCN which integrates multiple genomic data and clinical data has obviously superior prediction effect than existing survival prediction methods. CONCLUSION All study results in this paper have verified effectiveness and superiority of GCGCN in the aspect of cancer survival prediction.
Collapse
|
50
|
Xie G, Meng T, Luo Y, Liu Z. SKF-LDA: Similarity Kernel Fusion for Predicting lncRNA-Disease Association. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:45-55. [PMID: 31514111 PMCID: PMC6742806 DOI: 10.1016/j.omtn.2019.07.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 07/13/2019] [Accepted: 07/24/2019] [Indexed: 01/24/2023]
Abstract
Recently, prediction of lncRNA-disease associations has attracted more and more attentions. Various computational models have been proposed; however, there is still room to improve the prediction accuracy. In this paper, we propose a kernel fusion method with different types of similarities for the lncRNAs and diseases. The expression similarity and cosine similarity are used for lncRNAs, and the semantic similarity and cosine similarity are used for the diseases. To eliminate the noise effect, a neighbor constraint is enforced to refine all the similarity matrices before fusion. Experimental results show that the proposed similarity kernel fusion (SKF)-LDA method has the superiority performance in terms of AUC values and other measurements. In the schemes of LOOCV and 5-fold CV, AUC values of SKF-LDA achieve 0.9049 and 0.8743±0.0050 respectively. In addition, the conducted case studies of three diseases (hepatocellular carcinoma, lung cancer, and prostate cancer) show that SKF-LDA can predict related lncRNAs accurately.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Tengfei Meng
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Yu Luo
- School of Computer Science, Guangdong University of Technology, Guangzhou, China.
| | - Zhenguo Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|