51
|
Li J, Zhang S, Wan Y, Zhao Y, Shi J, Zhou Y, Cui Q. MISIM v2.0: a web server for inferring microRNA functional similarity based on microRNA-disease associations. Nucleic Acids Res 2020; 47:W536-W541. [PMID: 31069374 PMCID: PMC6602518 DOI: 10.1093/nar/gkz328] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 04/14/2019] [Accepted: 04/25/2019] [Indexed: 01/11/2023] Open
Abstract
MicroRNAs (miRNAs) are one class of important small non-coding RNA molecules and play critical roles in health and disease. Therefore, it is important and necessary to evaluate the functional relationship of miRNAs and then predict novel miRNA-disease associations. For this purpose, here we developed the updated web server MISIM (miRNA similarity) v2.0. Besides a 3-fold increase in data content compared with MISIM v1.0, MISIM v2.0 improved the original MISIM algorithm by implementing both positive and negative miRNA-disease associations. That is, the MISIM v2.0 scores could be positive or negative, whereas MISIM v1.0 only produced positive scores. Moreover, MISIM v2.0 achieved an algorithm for novel miRNA-disease prediction based on MISIM v2.0 scores. Finally, MISIM v2.0 provided network visualization and functional enrichment analysis for functionally paired miRNAs. The MISIM v2.0 web server is freely accessible at http://www.lirmed.com/misim/.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.,Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Shan Zhang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Yanping Wan
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Yingshu Zhao
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Jiangcheng Shi
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China.,Sanbo Brain Institute, Sanbo Brain Hospital, Capital Medical University, Beijing 100093, China
| |
Collapse
|
52
|
Liu J, Zuo Z, Wu G. Link Prediction Only With Interaction Data and its Application on Drug Repositioning. IEEE Trans Nanobioscience 2020; 19:547-555. [PMID: 32340956 DOI: 10.1109/tnb.2020.2990291] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
To assist drug development, many computational methods have been proposed to identify potential drug-disease treatment associations before wet experiments. Based on the assumption that similar drugs may treat similar diseases, most methods calculate the similarities of drugs and diseases by using various chemical or biological features. However, since these features may be unknown or hard to collect, such methods will not work in the face of incomplete data. Besides, due to the lack of validated negative samples in the drug-disease associations data, most methods have no choice but to simply select some unlabeled samples as negative ones, which may introduce noises and decrease the reliability of prediction. Herein, we propose a new method (TS-SVD) which only uses those known drug-protein, disease-protein and drug-disease interactions to predict the potential drug-disease associations. In a constructed drug-protein-disease heterogeneous network, assuming that drugs/diseases relating to some common proteins or diseases/drugs may be similar, we get the common neighbors count matrix of drugs/diseases, then convert it to a topological similarity matrix. After that, we get low dimensional embedding representations of drug-disease pairs by using topological features and singular value decomposition. Finally, a Random Forest classifier is trained to do the prediction. To train a more reasonable model, we select out some reliable negative samples based on the k -step neighbors relationships between drugs and diseases. Compared with some state-of-the-art methods, we use less information but achieve better or comparable performance. Meanwhile, our strategy for selecting reliable negative samples can improve the performances of these methods. Case studies have further shown the practicality of our method in discovering novel drug-disease associations.
Collapse
|
53
|
Huang F, Qiu Y, Li Q, Liu S, Ni F. Predicting Drug-Disease Associations via Multi-Task Learning Based on Collective Matrix Factorization. Front Bioeng Biotechnol 2020; 8:218. [PMID: 32373595 PMCID: PMC7179666 DOI: 10.3389/fbioe.2020.00218] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 03/04/2020] [Indexed: 12/30/2022] Open
Abstract
Identifying drug-disease associations is integral to drug development. Computationally prioritizing candidate drug-disease associations has attracted growing attention due to its contribution to reducing the cost of laboratory screening. Drug-disease associations involve different association types, such as drug indications and drug side effects. However, the existing models for predicting drug-disease associations merely concentrate on independent tasks: recommending novel indications to benefit drug repositioning, predicting potential side effects to prevent drug-induced risk, or only determining the existence of drug-disease association. They ignore crucial prior knowledge of the correlations between different association types. Since the Comparative Toxicogenomics Database (CTD) annotates the drug-disease associations as therapeutic or marker/mechanism, we consider predicting the two types of association. To this end, we propose a collective matrix factorization-based multi-task learning method (CMFMTL) in this paper. CMFMTL handles the problem as multi-task learning where each task is to predict one type of association, and two tasks complement and improve each other by capturing the relatedness between them. First, drug-disease associations are represented as a bipartite network with two types of links representing therapeutic effects and non-therapeutic effects. Then, CMFMTL, respectively, approximates the association matrix regarding each link type by matrix tri-factorization, and shares the low-dimensional latent representations for drugs and diseases in the two related tasks for the goal of collective learning. Finally, CMFMTL puts the two tasks into a unified framework and an efficient algorithm is developed to solve our proposed optimization problem. In the computational experiments, CMFMTL outperforms several state-of-the-art methods both in the two tasks. Moreover, case studies show that CMFMTL helps to find out novel drug-disease associations that are not included in CTD, and simultaneously predicts their association types.
Collapse
Affiliation(s)
- Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Qiaojun Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- School of Electronic and Information Engineering, Henan Polytechnic Institute, Henan Nanyang, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Fuchuan Ni
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| |
Collapse
|
54
|
Jiang HJ, Huang YA, You ZH. SAEROF: an ensemble approach for large-scale drug-disease association prediction by incorporating rotation forest and sparse autoencoder deep neural network. Sci Rep 2020; 10:4972. [PMID: 32188871 PMCID: PMC7080766 DOI: 10.1038/s41598-020-61616-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 02/13/2020] [Indexed: 01/01/2023] Open
Abstract
Drug-disease association is an important piece of information which participates in all stages of drug repositioning. Although the number of drug-disease associations identified by high-throughput technologies is increasing, the experimental methods are time consuming and expensive. As supplement to them, many computational methods have been developed for an accurate in silico prediction for new drug-disease associations. In this work, we present a novel computational model combining sparse auto-encoder and rotation forest (SAEROF) to predict drug-disease association. Gaussian interaction profile kernel similarity, drug structure similarity and disease semantic similarity were extracted for exploring the association among drugs and diseases. On this basis, a rotation forest classifier based on sparse auto-encoder is proposed to predict the association between drugs and diseases. In order to evaluate the performance of the proposed model, we used it to implement 10-fold cross validation on two golden standard datasets, Fdataset and Cdataset. As a result, the proposed model achieved AUCs (Area Under the ROC Curve) of Fdataset and Cdataset are 0.9092 and 0.9323, respectively. For performance evaluation, we compared SAEROF with the state-of-the-art support vector machine (SVM) classifier and some existing computational models. Three human diseases (Obesity, Stomach Neoplasms and Lung Neoplasms) were explored in case studies. As a result, more than half of the top 20 drugs predicted were successfully confirmed by the Comparative Toxicogenomics Database(CTD database). This model is a feasible and effective method to predict drug-disease correlation, and its performance is significantly improved compared with existing methods.
Collapse
Affiliation(s)
- Han-Jing Jiang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong.
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China.
| |
Collapse
|
55
|
Wang X, Yan R. DDAPRED: a computational method for predicting drug repositioning using regularized logistic matrix factorization. J Mol Model 2020; 26:60. [PMID: 32062701 DOI: 10.1007/s00894-020-4315-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Accepted: 01/28/2020] [Indexed: 01/14/2023]
Abstract
Due to rising development costs and stagnant product outputs of traditional drug discovery methods, drug repositioning, which discovers new indications for existing drugs, has attracted increasing interest. Computational drug repositioning can integrate prioritization information and accelerate time lines even further. However, most existing methods for predicting drug repositioning have low precisions. The present article proposed a new method named DDAPRED (https://github.com/nongdaxiaofeng/DDAPRED) for drug repositioning prediction. The method integrated multiple sources of drug similarity and disease similarity information, and it used the regularized logistic matrix decomposition method to significantly improve the prediction performance. In 5-fold cross-validation, the areas under the receiver operating characteristic curve (AUROC) and the precision-recall curve (AUPRC) of DDAPRED reached 0.932 and 0.438, respectively, exceeding other methods. The present study also analyzed the parameters influencing the model performance and the effect of different drug similarity information in-depth, and it verified the treatment relationship of the top 50 predictions with unknown relationships in the training set, further demonstrating the practicability of our method.
Collapse
Affiliation(s)
- Xiaofeng Wang
- College of Mathematics and Computer Science, Shanxi Normal University, Linfen, 041004, China
| | - Renxiang Yan
- College of Biological Science and Engineering, Fuzhou University, Fuzhou, 350106, Fujian, China. .,Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou, 350116, Fujian, China.
| |
Collapse
|
56
|
Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, Lin SM, Zhang W, Zhang P, Sun H. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 2020; 36:1241-1251. [PMID: 31584634 PMCID: PMC7703771 DOI: 10.1093/bioinformatics/btz718] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 08/25/2019] [Accepted: 09/26/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. RESULTS We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. AVAILABILITY AND IMPLEMENTATION As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Yue
- Department of Computer Science and Engineering, OH, USA
| | - Zhen Wang
- Department of Computer Science and Engineering, OH, USA
| | - Jingong Huang
- Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA
| | | | - Soheil Moosavinasab
- Research Information Solutions and Innovation, The Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
| | - Yungui Huang
- Research Information Solutions and Innovation, The Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
| | - Simon M Lin
- Research Information Solutions and Innovation, The Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Ping Zhang
- Department of Computer Science and Engineering, OH, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Huan Sun
- Department of Computer Science and Engineering, OH, USA
| |
Collapse
|
57
|
Li Z, Huang Q, Chen X, Wang Y, Li J, Xie Y, Dai Z, Zou X. Identification of Drug-Disease Associations Using Information of Molecular Structures and Clinical Symptoms via Deep Convolutional Neural Network. Front Chem 2020; 7:924. [PMID: 31998700 PMCID: PMC6966717 DOI: 10.3389/fchem.2019.00924] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 12/18/2019] [Indexed: 02/02/2023] Open
Abstract
Identifying drug-disease associations is helpful for not only predicting new drug indications and recognizing lead compounds, but also preventing, diagnosing, treating diseases. Traditional experimental methods are time consuming, laborious and expensive. Therefore, it is urgent to develop computational method for predicting potential drug-disease associations on a large scale. Herein, a novel method was proposed to identify drug-disease associations based on the deep learning technique. Molecular structure and clinical symptom information were used to characterize drugs and diseases. Then, a novel two-dimensional matrix was constructed and mapped to a gray-scale image for representing drug-disease association. Finally, deep convolution neural network was introduced to build model for identifying potential drug-disease associations. The performance of current method was evaluated based on the training set and test set, and accuracies of 89.90 and 86.51% were obtained. Prediction ability for recognizing new drug indications, lead compounds and true drug-disease associations was also investigated and verified by performing various experiments. Additionally, 3,620,516 potential drug-disease associations were identified and some of them were further validated through docking modeling. It is anticipated that the proposed method may be a powerful large scale virtual screening tool for drug research and development. The source code of MATLAB is freely available on request from the authors.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, China.,School of Chemistry, Sun Yat-Sen University, Guangzhou, China
| | - Qixing Huang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Xingyu Chen
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Yang Wang
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, China
| | - Jinlong Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Zong Dai
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, China
| | - Xiaoyong Zou
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, China
| |
Collapse
|
58
|
Liang X, Zhang P, Li J, Fu Y, Qu L, Chen Y, Chen Z. Learning important features from multi-view data to predict drug side effects. J Cheminform 2019; 11:79. [PMID: 33430979 PMCID: PMC6916463 DOI: 10.1186/s13321-019-0402-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 12/05/2019] [Indexed: 02/06/2023] Open
Abstract
The problem of drug side effects is one of the most crucial issues in pharmacological development. As there are many limitations in current experimental and clinical methods for detecting side effects, a lot of computational algorithms have been developed to predict side effects with different types of drug information. However, there is still a lack of methods which could integrate heterogeneous data to predict side effects and select important features at the same time. Here, we propose a novel computational framework based on multi-view and multi-label learning for side effect prediction. Four different types of drug features are collected and graph model is constructed from each feature profile. After that, all the single view graphs are combined to regularize the linear regression functions which describe the relationships between drug features and side effect labels. L1 penalties are imposed on the regression coefficient matrices in order to select features relevant to side effects. Additionally, the correlations between side effect labels are also incorporated into the model by graph Laplacian regularization. The experimental results show that the proposed method could not only provide more accurate prediction for side effects but also select drug features related to side effects from heterogeneous data. Some case studies are also supplied to illustrate the utility of our method for prediction of drug side effects.
Collapse
Affiliation(s)
- Xujun Liang
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China.
| | - Pengfei Zhang
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| | - Jun Li
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| | - Ying Fu
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| | - Lingzhi Qu
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| | - Yongheng Chen
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| | - Zhuchu Chen
- NHC Key Laboratory of Cancer Proteomics, Xiangya Hospital, Central South University, XiangYa Road, Changsha, China
| |
Collapse
|
59
|
Jiang HJ, You ZH, Huang YA. Predicting drug-disease associations via sigmoid kernel-based convolutional neural networks. J Transl Med 2019; 17:382. [PMID: 31747915 PMCID: PMC6868698 DOI: 10.1186/s12967-019-2127-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 11/05/2019] [Indexed: 12/02/2022] Open
Abstract
Background In the process of drug development, computational drug repositioning is effective and resource-saving with regards to its important functions on identifying new drug–disease associations. Recent years have witnessed a great progression in the field of data mining with the advent of deep learning. An increasing number of deep learning-based techniques have been proposed to develop computational tools in bioinformatics. Methods Along this promising direction, we here propose a drug repositioning computational method combining the techniques of Sigmoid Kernel and Convolutional Neural Network (SKCNN) which is able to learn new features effectively representing drug–disease associations via its hidden layers. Specifically, we first construct similarity metric of drugs using drug sigmoid similarity and drug structural similarity, and that of disease using disease sigmoid similarity and disease semantic similarity. Based on the combined similarities of drugs and diseases, we then use SKCNN to learn hidden representations for each drug-disease pair whose labels are finally predicted by a classifier based on random forest. Results A series of experiments were implemented for performance evaluation and their results show that the proposed SKCNN improves the prediction accuracy compared with other state-of-the-art approaches. Case studies of two selected disease are also conducted through which we prove the superior performance of our method in terms of the actual discovery of potential drug indications. Conclusion The aim of this study was to establish an effective predictive model for finding new drug–disease associations. These experimental results show that SKCNN can effectively predict the association between drugs and diseases.
Collapse
Affiliation(s)
- Han-Jing Jiang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Ürümqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Ürümqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China.
| | - Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, HungHom, Hong Kong.
| |
Collapse
|
60
|
Xuan P, Cui H, Shen T, Sheng N, Zhang T. HeteroDualNet: A Dual Convolutional Neural Network With Heterogeneous Layers for Drug-Disease Association Prediction via Chou's Five-Step Rule. Front Pharmacol 2019; 10:1301. [PMID: 31780934 PMCID: PMC6856670 DOI: 10.3389/fphar.2019.01301] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 11/14/2022] Open
Abstract
Identifying new treatments for existing drugs can help reduce drug development costs and explore novel indications of drugs. The prediction of associations between drugs and diseases is challenging because their similarities and relations are complicated and non-linear. We propose a HeteroDualNet model to address this issue. Firstly, three types of matrices are extracted to represent intra-drug similarities, intra-disease similarity and drug-disease associations. The intra-drug similarities consider three drug features and a newly introduced drug-related disease correlation. Secondly, an embedding mechanism is proposed to integrate these matrices in a heterogenous drug-disease association layer (hetero-layer). Further, a neighbouring heterogeneous layer (hetero-layer-N) is constructed to incorporate the biological premise that similar drugs can often treat related diseases. Finally, a dual convolutional neural network is built with hetero-layer and hetero-layer-N as two branches to learn from characteristics of drug-disease and the relations of their neighbours simultaneously. HeteroDualNet outperformed the other four methods in comparison over a public dataset of 763 drugs and 681 diseases in terms of Areas Under the Curves of Receiver Operating Characteristics and Precision-Recall, and recall rate at top k. Case study of five drugs further proved the capacity of HeteroDualNet in finding reliable disease candidates of drugs as validated by database records or literature. Our findings show that the embedded heterogenous layers of original and neighbouring drug-disease representations in a dual neural network improved the association prediction performance.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Bundoora, VIC, Australia
| | - Tonghui Shen
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Nan Sheng
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin, China
| |
Collapse
|
61
|
Predicting Drug-Disease Associations via Using Gaussian Interaction Profile and Kernel-Based Autoencoder. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2426958. [PMID: 31534955 PMCID: PMC6732622 DOI: 10.1155/2019/2426958] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 07/05/2019] [Accepted: 07/22/2019] [Indexed: 01/18/2023]
Abstract
Computational drug repositioning, designed to identify new indications for existing drugs, significantly reduced the cost and time involved in drug development. Prediction of drug-disease associations is promising for drug repositioning. Recent years have witnessed an increasing number of machine learning-based methods for calculating drug repositioning. In this paper, a novel feature learning method based on Gaussian interaction profile kernel and autoencoder (GIPAE) is proposed for drug-disease association. In order to further reduce the computation cost, both batch normalization layer and the full-connected layer are introduced to reduce training complexity. The experimental results of 10-fold cross validation indicate that the proposed method achieves superior performance on Fdataset and Cdataset with the AUCs of 93.30% and 96.03%, respectively, which were higher than many previous computational models. To further assess the accuracy of GIPAE, we conducted case studies on two complex human diseases. The top 20 drugs predicted, 14 obesity-related drugs, and 11 drugs related to Alzheimer's disease were validated in the CTD database. The results of cross validation and case studies indicated that GIPAE is a reliable model for predicting drug-disease associations.
Collapse
|
62
|
Xuan P, Song Y, Zhang T, Jia L. Prediction of Potential Drug-Disease Associations through Deep Integration of Diversity and Projections of Various Drug Features. Int J Mol Sci 2019; 20:ijms20174102. [PMID: 31443472 PMCID: PMC6747548 DOI: 10.3390/ijms20174102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 08/19/2019] [Accepted: 08/20/2019] [Indexed: 11/17/2022] Open
Abstract
Identifying new indications for existing drugs may reduce costs and expedites drug development. Drug-related disease predictions typically combined heterogeneous drug-related and disease-related data to derive the associations between drugs and diseases, while recently developed approaches integrate multiple kinds of drug features, but fail to take the diversity implied by these features into account. We developed a method based on non-negative matrix factorization, DivePred, for predicting potential drug–disease associations. DivePred integrated disease similarity, drug–disease associations, and various drug features derived from drug chemical substructures, drug target protein domains, drug target annotations, and drug-related diseases. Diverse drug features reflect the characteristics of drugs from different perspectives, and utilizing the diversity of multiple kinds of features is critical for association prediction. The various drug features had higher dimensions and sparse characteristics, whereas DivePred projected high-dimensional drug features into the low-dimensional feature space to generate dense feature representations of drugs. Furthermore, DivePred’s optimization term enhanced diversity and reduced redundancy of multiple kinds of drug features. The neighbor information was exploited to infer the likelihood of drug–disease associations. Experiments indicated that DivePred was superior to several state-of-the-art methods for prediction drug-disease association. During the validation process, DivePred identified more drug-disease associations in the top part of prediction result than other methods, benefitting further biological validation. Case studies of acetaminophen, ciprofloxacin, doxorubicin, hydrocortisone, and ampicillin demonstrated that DivePred has the ability to discover potential candidate disease indications for drugs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yingying Song
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China.
| | - Lan Jia
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
63
|
Inferring Drug-Related Diseases Based on Convolutional Neural Network and Gated Recurrent Unit. Molecules 2019; 24:molecules24152712. [PMID: 31349692 PMCID: PMC6696443 DOI: 10.3390/molecules24152712] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 07/18/2019] [Accepted: 07/19/2019] [Indexed: 12/15/2022] Open
Abstract
Predicting novel uses for drugs using their chemical, pharmacological, and indication information contributes to minimizing costs and development periods. Most previous prediction methods focused on integrating the similarity and association information of drugs and diseases. However, they tended to construct shallow prediction models to predict drug-associated diseases, which make deeply integrating the information difficult. Further, path information between drugs and diseases is important auxiliary information for association prediction, while it is not deeply integrated. We present a deep learning-based method, CGARDP, for predicting drug-related candidate disease indications. CGARDP establishes a feature matrix by exploiting a variety of biological premises related to drugs and diseases. A novel model based on convolutional neural network (CNN) and gated recurrent unit (GRU) is constructed to learn the local and path representations for a drug-disease pair. The CNN-based framework on the left of the model learns the local representation of the drug-disease pair from their feature matrix. As the different paths have discriminative contributions to the drug-disease association prediction, we construct an attention mechanism at the path level to learn the informative paths. In the right part, a GRU-based framework learns the path representation based on path information between the drug and the disease. Cross-validation results indicate that CGARDP performs better than several state-of-the-art methods. Further, CGARDP retrieves more real drug-disease associations in the top part of the prediction result that are of concern to biologists. Case studies on five drugs demonstrate that CGARDP can discover potential drug-related disease indications.
Collapse
|
64
|
Xuan P, Ye Y, Zhang T, Zhao L, Sun C. Convolutional Neural Network and Bidirectional Long Short-Term Memory-Based Method for Predicting Drug-Disease Associations. Cells 2019; 8:E705. [PMID: 31336774 PMCID: PMC6679344 DOI: 10.3390/cells8070705] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 07/08/2019] [Accepted: 07/09/2019] [Indexed: 12/16/2022] Open
Abstract
Identifying novel indications for approved drugs can accelerate drug development and reduce research costs. Most previous studies used shallow models for prioritizing the potential drug-related diseases and failed to deeply integrate the paths between drugs and diseases which may contain additional association information. A deep-learning-based method for predicting drug-disease associations by integrating useful information is needed. We proposed a novel method based on a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM)-CBPred-for predicting drug-related diseases. Our method deeply integrates similarities and associations between drugs and diseases, and paths among drug-disease pairs. The CNN-based framework focuses on learning the original representation of a drug-disease pair from their similarities and associations. As the drug-disease association possibility also depends on the multiple paths between them, the BiLSTM-based framework mainly learns the path representation of the drug-disease pair. In addition, considering that different paths have discriminate contributions to the association prediction, an attention mechanism at path level is constructed. Our method, CBPred, showed better performance and retrieved more real associations in the front of the results, which is more important for biologists. Case studies further confirmed that CBPred can discover potential drug-disease associations.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yilin Ye
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China.
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China.
| | - Lianfeng Zhao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Chang Sun
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
65
|
Abstract
We present a bipartite graph-based approach to calculate drug pairwise similarity for identifying potential new indications of approved drugs. Both chemical and molecular features were used in drug similarity calculation. In this paper, we first extracted drug chemical structures and drug-target interactions. Second, we computed chemical structure similarity and drug- target profile similarity. Further, we constructed a bipartite graph model with known relationships between drugs and their target proteins. Finally, we weighted summing drug structure similarity with target profile similarity to derive drug pairwise similarity, so that we can predict potential indication of a drug from its similar drugs. In addition, we summarized some alternative strategies and variations follow-up to each section in the overall analysis.
Collapse
|
66
|
Wu G, Liu J, Yue X. Prediction of drug-disease associations based on ensemble meta paths and singular value decomposition. BMC Bioinformatics 2019; 20:134. [PMID: 30925858 PMCID: PMC6439991 DOI: 10.1186/s12859-019-2644-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background In the field of drug repositioning, it is assumed that similar drugs may treat similar diseases, therefore many existing computational methods need to compute the similarities of drugs and diseases. However, the calculation of similarity depends on the adopted measure and the available features, which may lead that the similarity scores vary dramatically from one to another, and it will not work when facing the incomplete data. Besides, supervised learning based methods usually need both positive and negative samples to train the prediction models, whereas in drug-disease pairs data there are only some verified interactions (positive samples) and a lot of unlabeled pairs. To train the models, many methods simply treat the unlabeled samples as negative ones, which may introduce artificial noises. Herein, we propose a method to predict drug-disease associations without the need of similarity information, and select more likely negative samples. Results In the proposed EMP-SVD (Ensemble Meta Paths and Singular Value Decomposition), we introduce five meta paths corresponding to different kinds of interaction data, and for each meta path we generate a commuting matrix. Every matrix is factorized into two low rank matrices by SVD which are used for the latent features of drugs and diseases respectively. The features are combined to represent drug-disease pairs. We build a base classifier via Random Forest for each meta path and five base classifiers are combined as the final ensemble classifier. In order to train out a more reliable prediction model, we select more likely negative ones from unlabeled samples under the assumption that non-associated drug and disease pair have no common interacted proteins. The experiments have shown that the proposed EMP-SVD method outperforms several state-of-the-art approaches. Case studies by literature investigation have found that the proposed EMP-SVD can mine out many drug-disease associations, which implies the practicality of EMP-SVD. Conclusions The proposed EMP-SVD can integrate the interaction data among drugs, proteins and diseases, and predict the drug-disease associations without the need of similarity information. At the same time, the strategy of selecting more reliable negative samples will benefit the prediction.
Collapse
Affiliation(s)
- Guangsheng Wu
- School of Computer Science, Wuhan University, Wuhan, 430072, People's Republic of China
| | - Juan Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, People's Republic of China. .,Suzhou Institute of Wuhan University, Suzhou, 215123, People's Republic of China.
| | - Xiang Yue
- School of Computer Science, Wuhan University, Wuhan, 430072, People's Republic of China.,Department of Computer Science and Engineering, The Ohio State University, Ohio, 43210, USA
| |
Collapse
|
67
|
Tian Z, Teng Z, Cheng S, Guo M. Computational drug repositioning using meta-path-based semantic network analysis. BMC SYSTEMS BIOLOGY 2018; 12:134. [PMID: 30598084 PMCID: PMC6311940 DOI: 10.1186/s12918-018-0658-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND Drug repositioning is a promising and efficient way to discover new indications for existing drugs, which holds the great potential for precision medicine in the post-genomic era. Many network-based approaches have been proposed for drug repositioning based on similarity networks, which integrate multiple sources of drugs and diseases. However, these methods may simply view nodes as the same-typed and neglect the semantic meanings of different meta-paths in the heterogeneous network. Therefore, it is urgent to develop a rational method to infer new indications for approved drugs. RESULTS In this study, we proposed a novel methodology named HeteSim_DrugDisease (HSDD) for the prediction of drug repositioning. Firstly, we build the drug-drug similarity network and disease-disease similarity network by integrating the information of drugs and diseases. Secondly, a drug-disease heterogeneous network is constructed, which combines the drug similarity network, disease similarity network as well as the known drug-disease association network. Finally, HSDD predicts novel drug-disease associations based on the HeteSim scores of different meta-paths. The experimental results show that HSDD performs significantly better than the existing state-of-the-art approaches. HSDD achieves an AUC score of 0.8994 in the leave-one-out cross validation experiment. Moreover, case studies for selected drugs further illustrate the practical usefulness of HSDD. CONCLUSIONS HSDD can be an effective and feasible way to infer the associations between drugs and diseases using on meta-path-based semantic network analysis.
Collapse
Affiliation(s)
- Zhen Tian
- School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, People's Republic of China
| | - Zhixia Teng
- School of information and computer engineering, Northeast Forestry, Harbin, 150001, People's Republic of China
| | - Shuang Cheng
- Institute of Materials, China Academy of Engineering Physics, Jiang You, 621907, Sichuan, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, People's Republic of China. .,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, 100044, China.
| |
Collapse
|
68
|
Yu SP, Liang C, Xiao Q, Li GH, Ding PJ, Luo JW. MCLPMDA: A novel method for miRNA-disease association prediction based on matrix completion and label propagation. J Cell Mol Med 2018; 23:1427-1438. [PMID: 30499204 PMCID: PMC6349206 DOI: 10.1111/jcmm.14048] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 11/02/2018] [Indexed: 12/20/2022] Open
Abstract
MiRNAs are a class of small non‐coding RNAs that are involved in the development and progression of various complex diseases. Great efforts have been made to discover potential associations between miRNAs and diseases recently. As experimental methods are in general expensive and time‐consuming, a large number of computational models have been developed to effectively predict reliable disease‐related miRNAs. However, the inherent noise and incompleteness in the existing biological datasets have inevitably limited the prediction accuracy of current computational models. To solve this issue, in this paper, we propose a novel method for miRNA‐disease association prediction based on matrix completion and label propagation. Specifically, our method first reconstructs a new miRNA/disease similarity matrix by matrix completion algorithm based on known experimentally verified miRNA‐disease associations and then utilizes the label propagation algorithm to reliably predict disease‐related miRNAs. As a result, MCLPMDA achieved comparable performance under different evaluation metrics and was capable of discovering greater number of true miRNA‐disease associations. Moreover, case study conducted on Breast Neoplasms further confirmed the prediction reliability of the proposed method. Taken together, the experimental results clearly demonstrated that MCLPMDA can serve as an effective and reliable tool for miRNA‐disease association prediction.
Collapse
Affiliation(s)
- Sheng-Peng Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Guang-Hui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Ping-Jian Ding
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jia-Wei Luo
- College of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
69
|
Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, Liu F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics 2018; 19:233. [PMID: 29914348 PMCID: PMC6006580 DOI: 10.1186/s12859-018-2220-4] [Citation(s) in RCA: 154] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 05/28/2018] [Indexed: 02/06/2023] Open
Abstract
Background Drug-disease associations provide important information for the drug discovery. Wet experiments that identify drug-disease associations are time-consuming and expensive. However, many drug-disease associations are still unobserved or unknown. The development of computational methods for predicting unobserved drug-disease associations is an important and urgent task. Results In this paper, we proposed a similarity constrained matrix factorization method for the drug-disease association prediction (SCMFDD), which makes use of known drug-disease associations, drug features and disease semantic information. SCMFDD projects the drug-disease association relationship into two low-rank spaces, which uncover latent features for drugs and diseases, and then introduces drug feature-based similarities and disease semantic similarity as constraints for drugs and diseases in low-rank spaces. Different from the classic matrix factorization technique, SCMFDD takes the biological context of the problem into account. In computational experiments, the proposed method can produce high-accuracy performances on benchmark datasets, and outperform existing state-of-the-art prediction methods when evaluated by five-fold cross validation and independent testing. Conclusion We developed a user-friendly web server by using known associations collected from the CTD database, available at http://www.bioinfotech.cn/SCMFDD/. The case studies show that the server can find out novel associations, which are not included in the CTD database.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Xiang Yue
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Weiran Lin
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Wenjian Wu
- School of Electronic Information, Wuhan University, Wuhan, 430072, China
| | - Ruoqi Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Feng Huang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Feng Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
70
|
Predicting drug-disease interactions by semi-supervised graph cut algorithm and three-layer data integration. BMC Med Genomics 2017; 10:79. [PMID: 29297383 PMCID: PMC5751445 DOI: 10.1186/s12920-017-0311-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Prediction of drug-disease interactions is promising for either drug repositioning or disease treatment fields. The discovery of novel drug-disease interactions, on one hand can help to find novel indictions for the approved drugs; on the other hand can provide new therapeutic approaches for the diseases. Recently, computational methods for finding drug-disease interactions have attracted lots of attention because of their far more higher efficiency and lower cost than the traditional wet experiment methods. However, they still face several challenges, such as the organization of the heterogeneous data, the performance of the model, and so on. Methods In this work, we present to hierarchically integrate the heterogeneous data into three layers. The drug-drug and disease-disease similarities are first calculated separately in each layer, and then the similarities from three layers are linearly fused into comprehensive drug similarities and disease similarities, which can then be used to measure the similarities between two drug-disease pairs. We construct a novel weighted drug-disease pair network, where a node is a drug-disease pair with known or unknown treatment relation, an edge represents the node-node relation which is weighted with the similarity score between two pairs. Now that similar drug-disease pairs are supposed to show similar treatment patterns, we can find the optimal graph cut of the network. The drug-disease pair with unknown relation can then be considered to have similar treatment relation with that within the same cut. Therefore, we develop a semi-supervised graph cut algorithm, SSGC, to find the optimal graph cut, based on which we can identify the potential drug-disease treatment interactions. Results By comparing with three representative network-based methods, SSGC achieves the highest performances, in terms of both AUC score and the identification rates of true drug-disease pairs. The experiments with different integration strategies also demonstrate that considering several sources of data can improve the performances of the predictors. Further case studies on four diseases, the top-ranked drug-disease associations have been confirmed by KEGG, CTD database and the literature, illustrating the usefulness of SSGC. Conclusions The proposed comprehensive similarity scores from multi-views and multiple layers and the graph-cut based algorithm can greatly improve the prediction performances of drug-disease associations.
Collapse
|
71
|
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction. PLoS Comput Biol 2017; 13:e1005912. [PMID: 29253885 PMCID: PMC5749861 DOI: 10.1371/journal.pcbi.1005912] [Citation(s) in RCA: 186] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 01/02/2018] [Accepted: 12/01/2017] [Indexed: 12/17/2022] Open
Abstract
Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. Discovering miRNA-disease associations promotes the understanding towards the molecular mechanisms of various human diseases at the miRNA level, and contributes to the development of diagnostic biomarkers and treatment tools for diseases. Computational models can make the discovery more efficient and experiments more productive. LRSSLMDA was proposed to computationally infer potential miRNA-disease associations via adopting sparse subspace learning with Laplacian regularization on the known miRNA-disease association network and the informative feature profiles extracted from the integrated miRNA/disease similarity networks. Experimental results in global and local leave-one-out cross validation and 5-fold cross validation showed a superior prediction performance of LRSSLMDA over previous models. Moreover, three types of case studies on five important human diseases were carried out to further demonstrate the model’s predictive power: respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predicted miRNAs were confirmed by experimental literatures. So, we believe that LRSSLMDA could make reliable predictions and might guide future experimental studies on miRNA-disease associations.
Collapse
|