101
|
Zhang L, Liu T, Chen H, Zhao Q, Liu H. Predicting lncRNA-miRNA interactions based on interactome network and graphlet interaction. Genomics 2021; 113:874-880. [PMID: 33588070 DOI: 10.1016/j.ygeno.2021.02.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/10/2021] [Accepted: 02/09/2021] [Indexed: 02/06/2023]
Abstract
In the development and treatment of many human diseases, the regulatory roles between lncRNAs and miRNAs are important, but much remains unknown about them; moreover, experimental methods for analyzing them are expensive and time-consuming. In this work, we applied a semi-supervised interactome network-based approach to explore and forecast the latent interaction between lncRNAs and miRNAs. We constructed graphs according to the similarity of each of lncRNAs and miRNAs and determined the number of graphlet interaction isomers between nodes in these two graphs. According to the two graphs and the known interactive relationship, we calculated a score for lncRNA-miRNA pairs, as the prediction result. The results showed that the model (LMI-INGI) was reliable in fivefold cross-validation (AUC = 0.8957, PRE = 0.6815, REC = 0.8842, F1 score = 0.7452, AUPR = 0.9213). We also tested the model with data based on the similarity of expression profile and similarity of function for verifying the applicability of LMI-INGI, and the resulting AUC value was 0.9197 and 0.9006, respectively. Compared with the other four algorithms and variable similarity tests, our model successfully demonstrated superiority and good generalizability. LMI-INGI would be helpful in forecasting interactions between lncRNAs and miRNAs.
Collapse
Affiliation(s)
- Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China; Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Liaoning University, Shenyang, 110036, China; Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Shenyang, 110036, China
| | - Ting Liu
- School of Life Science, Liaoning University, Shenyang, 110036, China; China Medical University, The Queen's University of Belfast Joint College, Shenyang, 110122, China
| | - Haoyu Chen
- School of Information, Liaoning University, Shenyang, 110036, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Hongsheng Liu
- Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Liaoning University, Shenyang, 110036, China; Technology Innovation Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Shenyang, 110036, China; School of Pharmacy, Liaoning University, Shenyang, 110036, China.
| |
Collapse
|
102
|
Guo L, Shi K, Wang L. MLPMDA: Multi-layer linear projection for predicting miRNA-disease association. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
103
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
104
|
|
105
|
Liu Y, Guo Y, Liu X, Wang C, Guo M. Pathogenic gene prediction based on network embedding. Brief Bioinform 2020; 22:6053103. [PMID: 33367541 DOI: 10.1093/bib/bbaa353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.
Collapse
Affiliation(s)
- Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuchen Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
106
|
HAUBRW: Hybrid algorithm and unbalanced bi-random walk for predicting lncRNA-disease associations. Genomics 2020; 112:4777-4787. [PMID: 33348478 DOI: 10.1016/j.ygeno.2020.08.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 08/01/2020] [Accepted: 08/17/2020] [Indexed: 01/24/2023]
Abstract
An increasing number of research shows that long non-coding RNA plays a key role in many important biological processes. However, the number of disease-related lncRNAs found by researchers remains relatively small, and experimental identification is time consuming and labor intensive. In this study, we propose a novel method, namely HAUBRW, to predict undiscovered lncRNA-disease associations. First, the hybrid algorithm, which combines the heat spread algorithm and the probability diffusion algorithm, redistributes the resources. Second, unbalanced bi-random walk, is used to infer undiscovered lncRNA disease associations. Seven advanced models, i.e. BRWLDA, DSCMF, RWRlncD, IDLDA, KATZ, Ping's, and Yang's were compared with our method, and simulation results show that the AUC of our method is more perfect than the other models. In addition, case studies have shown that HAUBRW can effectively predict candidate lncRNAs for breast, osteosarcoma and cervical cancer. Therefore, our approach may be a good choice in future biomedical research.
Collapse
|
107
|
Ding Y, Tang J, Guo F. The Computational Models of Drug-target Interaction Prediction. Protein Pept Lett 2020; 27:348-358. [PMID: 30968771 DOI: 10.2174/0929866526666190410124110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 02/22/2019] [Accepted: 04/02/2019] [Indexed: 12/19/2022]
Abstract
The identification of Drug-Target Interactions (DTIs) is an important process in drug discovery and medical research. However, the tradition experimental methods for DTIs identification are still time consuming, extremely expensive and challenging. In the past ten years, various computational methods have been developed to identify potential DTIs. In this paper, the identification methods of DTIs are summarized. What's more, several state-of-the-art computational methods are mainly introduced, containing network-based method and machine learning-based method. In particular, for machine learning-based methods, including the supervised and semisupervised models, have essential differences in the approach of negative samples. Although these effective computational models in identification of DTIs have achieved significant improvements, network-based and machine learning-based methods have their disadvantages, respectively. These computational methods are evaluated on four benchmark data sets via values of Area Under the Precision Recall curve (AUPR).
Collapse
Affiliation(s)
- Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States.,School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
108
|
Yang Y, Fan C, Zhao Q. Recent Advances on the Machine Learning Methods in Identifying Phage Virion Proteins. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191203155511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In the field of bioinformatics, the prediction of phage virion proteins helps us understand
the interaction between phage and its host cells and promotes the development of new antibacterial
drugs. However, traditional experimental methods to identify phage virion proteins are expensive
and inefficient, more researchers are working to develop new computational methods. In this review,
we summarized the machine learning methods for predicting phage virion proteins during recent
years, and briefly described their advantages and limitations. Finally, some research directions
related to phage virion proteins are listed.
Collapse
Affiliation(s)
- Yingjuan Yang
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Chunlong Fan
- College of Computer Science, Shenyang Aerospace University, Shenyang, 110136, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| |
Collapse
|
109
|
Xiao Y, Xiao Z, Feng X, Chen Z, Kuang L, Wang L. A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs. BMC Bioinformatics 2020; 21:555. [PMID: 33267800 PMCID: PMC7709313 DOI: 10.1186/s12859-020-03906-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 11/25/2020] [Indexed: 12/25/2022] Open
Abstract
Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.
Collapse
Affiliation(s)
- Yubin Xiao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China
| | - Zheng Xiao
- Hunan Province Key Laboratory of Tumor Cellular and Molecular Pathology, Cancer Research Institute, University of South China, Hengyang, 421001, Hunan, People's Republic of China
| | - Xiang Feng
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China
| | - Linai Kuang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China. .,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China.
| |
Collapse
|
110
|
Wang M, Zhu P. MRWMDA: A novel framework to infer miRNA-disease associations. Biosystems 2020; 199:104292. [PMID: 33221377 DOI: 10.1016/j.biosystems.2020.104292] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 10/31/2020] [Accepted: 11/15/2020] [Indexed: 01/03/2023]
Abstract
MicroRNAs (miRNAs) are widely involved in a series of significant biological processes, which have been revealed and verified by accumulating experimental studies. The computational inference of the correlation between miRNAs and diseases is essential to facilitate the detection of disease biomarkers for disease diagnosis, prevention, treatment and prognosis. In this paper, a model with Multiple use of Random Walk with restart algorithm was introduced for the prediction of the MiRNA-Disease Association (MRWMDA). Based on diverse similarity measures, the model first implemented the random walk with restart (RWR) algorithm on the integrated similarity network to construct the topological similarity of miRNAs and diseases, which took full advantage of the network topology information. Then, the RWR algorithm was applied in the miRNA topological similarity network, and a steady probability of each miRNA-disease pair was obtained to prioritize miRNA candidates. In particular, the initial probability of the RWR algorithm was determined by utilizing the combination of the recommendation algorithm and the maximum similarity method. The proposed model achieved significant improvement in prediction compared with previous models, with an AUC of 0.9353 and an AUPR of 0.4809. In addition, case studies of breast neoplasms and lung neoplasms representing different disease types further demonstrated the excellent ability of MRWMDA in detecting potential disease-associated miRNAs. These performance analyses indicated that MRWMDA could be an effective and powerful biological computational tool in relevant biomedical studies.
Collapse
Affiliation(s)
- Meixi Wang
- School of Science, Jiangnan University, Wuxi 214122, China
| | - Ping Zhu
- School of Science, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
111
|
Toprak A, Eryilmaz E. Prediction of miRNA-disease associations based on Weighted [Formula: see text]-Nearest known neighbors and network consistency projection. J Bioinform Comput Biol 2020; 19:2050041. [PMID: 33148093 DOI: 10.1142/s0219720020500419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
MicroRNAs (miRNA) are a type of non-coding RNA molecules that are effective on the formation and the progression of many different diseases. Various researches have reported that miRNAs play a major role in the prevention, diagnosis, and treatment of complex human diseases. In recent years, researchers have made a tremendous effort to find the potential relationships between miRNAs and diseases. Since the experimental techniques used to find that new miRNA-disease relationships are time-consuming and expensive, many computational techniques have been developed. In this study, Weighted [Formula: see text]-Nearest Known Neighbors and Network Consistency Projection techniques were suggested to predict new miRNA-disease relationships using various types of knowledge such as known miRNA-disease relationships, functional similarity of miRNA, and disease semantic similarity. An average AUC of 0.9037 and 0.9168 were calculated in our method by 5-fold and leave-one-out cross validation, respectively. Case studies of breast, lung, and colon neoplasms were applied to prove the performance of our proposed technique, and the results confirmed the predictive reliability of this method. Therefore, reported experimental results have shown that our proposed method can be used as a reliable computational model to reveal potential relationships between miRNAs and diseases.
Collapse
Affiliation(s)
- Ahmet Toprak
- Department of Electricity and Energy, Bozkır Vocational School, Selcuk University, Konya, Turkey
| | - Esma Eryilmaz
- Department of Biomedical Engineering, Faculty of Technology, Selcuk University, Konya, Turkey
| |
Collapse
|
112
|
Wu TR, Yin MM, Jiao CN, Gao YL, Kong XZ, Liu JX. MCCMF: collaborative matrix factorization based on matrix completion for predicting miRNA-disease associations. BMC Bioinformatics 2020; 21:454. [PMID: 33054708 PMCID: PMC7556955 DOI: 10.1186/s12859-020-03799-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/02/2020] [Indexed: 02/06/2023] Open
Abstract
Background MicroRNAs (miRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a method, collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations. Results The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix. Then the Weight K Nearest Known Neighbors method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the fivefold cross-validation, with an AUC of 0.9569 (0.0005). Conclusions The AUC value of MCCMF is higher than other advanced methods in the fivefold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, accuracy, precision, recall and f-measure are also added. The final experimental results demonstrate that MCCMF outperforms other methods in predicting miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.
Collapse
Affiliation(s)
- Tian-Ru Wu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Meng-Meng Yin
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Cui-Na Jiao
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Ying-Lian Gao
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Xiang-Zhen Kong
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
113
|
Zheng K, You ZH, Wang L, Guo ZH. iMDA-BN: Identification of miRNA-disease associations based on the biological network and graph embedding algorithm. Comput Struct Biotechnol J 2020; 18:2391-2400. [PMID: 33005302 PMCID: PMC7508695 DOI: 10.1016/j.csbj.2020.08.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 11/30/2022] Open
Abstract
Benefiting from advances in high-throughput experimental techniques, important regulatory roles of miRNAs, lncRNAs, and proteins, as well as biological property information, are gradually being complemented. As the key data support to promote biomedical research, domain knowledge such as intermolecular relationships that are increasingly revealed by molecular genome-wide analysis is often used to guide the discovery of potential associations. However, the method of performing network representation learning from the perspective of the global biological network is scarce. These methods cover a very limited type of molecular associations and are therefore not suitable for more comprehensive analysis of molecular network representation information. In this study, we propose a computational model based on the Biological network for predicting potential associations between miRNAs and diseases called iMDA-BN. The iMDA-BN has three significant advantages: I) It uses a new method to describe disease and miRNA characteristics which analyzes node representation information for disease and miRNA from the perspective of biological networks. II) It can predict unproven associations even if miRNAs and diseases do not appear in the biological network. III) Accurate description of miRNA characteristics from biological properties based on high-throughput sequence information. The iMDA-BN predictor achieves an AUC of 0.9145 and an accuracy of 84.49% on the miRNA-disease association baseline dataset, and it can also achieve an AUC of 0.8765 and an accuracy of 80.96% when predicting unknown diseases and miRNAs in the biological network. Compared to existing miRNA-disease association prediction methods, iMDA-BN has higher accuracy and the advantage of predicting unknown associations. In addition, 45, 49, and 49 of the top 50 miRNA-disease associations with the highest predicted scores were confirmed in the case studies, respectively.
Collapse
Affiliation(s)
- Kai Zheng
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Lei Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China
| | - Zhen-Hao Guo
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| |
Collapse
|
114
|
Jafari M, Wang Y, Amiryousefi A, Tang J. Unsupervised Learning and Multipartite Network Models: A Promising Approach for Understanding Traditional Medicine. Front Pharmacol 2020; 11:1319. [PMID: 32982738 PMCID: PMC7479204 DOI: 10.3389/fphar.2020.01319] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 08/07/2020] [Indexed: 12/11/2022] Open
Abstract
The ultimate goal of precision medicine is to determine right treatment for right patients based on precise diagnosis. To achieve this goal, correct stratification of patients using molecular features and clinical phenotypes is crucial. During the long history of medical science, our understanding on disease classification has been improved greatly by chemistry and molecular biology. Nowadays, we gain access to large scale patient-derived data by high-throughput technologies, generating a greater need for data science including unsupervised learning and network modeling. Unsupervised learning methods such as clustering could be a better solution to stratify patients when there is a lack of predefined classifiers. In network modularity analysis, clustering methods can be also applied to elucidate the complex structure of biological and disease networks at the systems level. In this review, we went over the main points of clustering analysis and network modeling, particularly in the context of Traditional Chinese medicine (TCM). We showed that this approach can provide novel insights on the rationale of classification for TCM herbs. In a case study, using a modularity analysis of multipartite networks, we illustrated that the TCM classifications are associated with the chemical properties of the herb ingredients. We concluded that multipartite network modeling may become a suitable data integration tool for understanding the mechanisms of actions of traditional medicine.
Collapse
Affiliation(s)
- Mohieddin Jafari
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Yinyin Wang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Ali Amiryousefi
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| |
Collapse
|
115
|
Ding Y, Tian LP, Lei X, Liao B, Wu FX. Variational graph auto-encoders for miRNA-disease association prediction. Methods 2020; 192:25-34. [PMID: 32798654 DOI: 10.1016/j.ymeth.2020.08.004] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/03/2020] [Accepted: 08/08/2020] [Indexed: 02/07/2023] Open
Abstract
Cumulative experimental studies have demonstrated the critical roles of microRNAs (miRNAs) in the diverse fundamental and important biological processes, and in the development of numerous complex human diseases. Thus, exploring the relationships between miRNAs and diseases is helpful with understanding the mechanisms, the detection, diagnosis, and treatment of complex diseases. As the identification of miRNA-disease associations via traditional biological experiments is time-consuming and expensive, an effective computational prediction method is appealing. In this study, we present a deep learning framework with variational graph auto-encoder for miRNA-disease association prediction (VGAE-MDA). VGAE-MDA first gets the representations of miRNAs and diseases from the heterogeneous networks constructed by miRNA-miRNA similarity, disease-disease similarity, and known miRNA-disease associations. Then, VGAE-MDA constructs two sub-networks: miRNA-based network and disease-based network. Combining the representations based on the heterogeneous network, two variational graph auto-encoders (VGAE) are deployed for calculating the miRNA-disease association scores from two sub-networks, respectively. Lastly, VGAE-MDA obtains the final predicted association score for a miRNA-disease pair by integrating the scores from these two trained networks. Unlike the previous model, the VGAE-MDA can mitigate the effect of noises from random selection of negative samples. Besides, the use of graph convolutional neural (GCN) network can naturally incorporate the node features from the graph structure while the variational autoencoder (VAE) makes use of latent variables to predict associations from the perspective of data distribution. The experimental results show that VGAE-MDA outperforms the state-of-the-art approaches in miRNA-disease association prediction. Besides, the effectiveness of our model has been further demonstrated by case studies.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing 101125, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
116
|
Huang F, Yue X, Xiong Z, Yu Z, Liu S, Zhang W. Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations. Brief Bioinform 2020; 22:5876601. [PMID: 32725161 DOI: 10.1093/bib/bbaa140] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 05/27/2020] [Accepted: 06/06/2020] [Indexed: 01/02/2023] Open
Abstract
MicroRNAs (miRNAs) play crucial roles in multifarious biological processes associated with human diseases. Identifying potential miRNA-disease associations contributes to understanding the molecular mechanisms of miRNA-related diseases. Most of the existing computational methods mainly focus on predicting whether a miRNA-disease association exists or not. However, the roles of miRNAs in diseases are prominently diverged, for instance, Genetic variants of miRNA (mir-15) may affect the expression level of miRNAs leading to B cell chronic lymphocytic leukemia, while circulating miRNAs (including mir-1246, mir-1307-3p, etc.) have potentials to detecting breast cancer in the early stage. In this paper, we aim to predict multi-type miRNA-disease associations instead of taking them as binary. To this end, we innovatively represent miRNA-disease-type triples as a tensor and introduce tensor decomposition methods to solve the prediction task. Experimental results on two widely-adopted miRNA-disease datasets: HMDD v2.0 and HMDD v3.2 show that tensor decomposition methods improve a recent baseline in a large scale (up to $38\%$ in Top-1F1). We then propose a novel method, Tensor Decomposition with Relational Constraints (TDRC), which incorporates biological features as relational constraints to further the existing tensor decomposition methods. Compared with two existing tensor decomposition methods, TDRC can produce better performance while being more efficient.
Collapse
Affiliation(s)
- Feng Huang
- College of Informatics, Huazhong Agricultural University
| | - Xiang Yue
- Department of Computer Science & Engineering, The Ohio State University
| | - Zhankun Xiong
- College of Informatics, Huazhong Agricultural University
| | - Zhouxin Yu
- College of Informatics, Huazhong Agricultural University
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University
| | | |
Collapse
|
117
|
Abstract
Multidrug resistance (MDR) is a vital issue in cancer treatment. Drug resistance can be developed through a variety of mechanisms, including increased drug efflux, activation of detoxifying systems and DNA repair mechanisms, and escape of drug-induced apoptosis. Identifying the exact mechanism related in a particular case is a difficult task. Proteomics is the large-scale study of proteins, particularly their expression, structures and functions. In recent years, comparative proteomic methods have been performed to analyze MDR mechanisms in drug-selected model cancer cell lines. In this paper, we review the recent developments and progresses by comparative proteomic approaches to identify potential MDR mechanisms in drug-selected model cancer cell lines, which may help understand and design chemical sensitizers.
Collapse
|
118
|
Luo J, Long Y. NTSHMDA: Prediction of Human Microbe-Disease Association Based on Random Walk by Integrating Network Topological Similarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1341-1351. [PMID: 30489271 DOI: 10.1109/tcbb.2018.2883041] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Accumulating clinic evidences have demonstrated that the microbes residing in human bodies play a significantly important role in the formation, development, and progression of various complex human diseases. Identifying latent related microbes for disease could provide insight into human disease mechanisms and promote disease prevention, diagnosis, and treatment. In this paper, we first construct a heterogeneous network by connecting the disease similarity network and the microbe similarity network through known microbe-disease association network, and then develop a novel computational model to predict human microbe-disease associations based on random walk by integrating network topological similarity (NTSHMDA). Specifically, each microbe-disease association pair is regarded as a distinct relationship level and, thus, assigned different weights based on network topological similarity. The experimental results show that NTSHMDA outperforms some state-of-the-art methods with average AUCs of 0.9070, 0.8896 ± 0.0038 in the frameworks of Leave-one-out cross validation and 5-fold cross validation, respectively. In case studies, 9, 18, 38 and 9, 18, 45 out of top-10, 20, 50 candidate microbes are verified by recently published literatures for asthma and inflammatory bowel disease, respectively. In conclusion, NTSHMDA has potential ability to identify novel disease-microbe associations and can also provide valuable information for drug discovery and biological researches.
Collapse
|
119
|
Sumathipala M, Weiss ST. Predicting miRNA-based disease-disease relationships through network diffusion on multi-omics biological data. Sci Rep 2020; 10:8705. [PMID: 32457435 PMCID: PMC7251138 DOI: 10.1038/s41598-020-65633-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 05/07/2020] [Indexed: 12/18/2022] Open
Abstract
With critical roles in regulating gene expression, miRNAs are strongly implicated in the pathophysiology of many complex diseases. Experimental methods to determine disease related miRNAs are time consuming and costly. Computationally predicting miRNA-disease associations has potential applications in finding miRNA therapeutic pathways and in understanding the role of miRNAs in disease-disease relationships. In this study, we propose the MiRNA-disease Association Prediction (MAP) method, an in-silico method to predict and prioritize miRNA-disease associations. The MAP method applies a network diffusion approach, starting from the known disease genes in a heterogenous network constructed from miRNA-gene associations, protein-protein interactions, and gene-disease associations. Validation using experimental data on miRNA-disease associations demonstrated superior performance to two current state-of-the-art methods, with areas under the ROC curve all over 0.8 for four types of cancer. MAP is successfully applied to predict differential miRNA expression in four cancer types. Most strikingly, disease-disease relationships in terms of shared miRNAs revealed hidden disease subtyping comparable to that of previous work on shared genes between diseases, with applications for multi-omics characterization of disease relationships.
Collapse
Affiliation(s)
- Marissa Sumathipala
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Harvard College, Cambridge, MA, USA.
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
120
|
Lin Y, Huang G, Jin H, Jian Z. Circular RNA Gprc5a Promotes HCC Progression by Activating YAP1/TEAD1 Signalling Pathway by Sponging miR-1283. Onco Targets Ther 2020; 13:4509-4521. [PMID: 32547082 PMCID: PMC7247601 DOI: 10.2147/ott.s240261] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 04/18/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Circular RNA (circRNA) plays a critical role in tumorigenesis and tumor progression. Many studies indicate that circRNA Gprc5a is significantly upregulated and functions as an oncogene in a variety of cancers. However, the molecular mechanism of circGprc5a in liver cancer remains unclear. METHODS qRT-PCR was used to measure the expression levels of circGprc5a, miR-1283, YAP1 and TEAD1 mRNA in hepatocellular carcinoma (HCC) tissues or cells. YAP1 and TEAD1 protein levels were detected by Western blot. CCK-8 assay, cell colony formation, BrdU incorporation and Annexin V-FITC/PI assays were performed to analyze the effects of circGprc5a and miR-1283 on cell proliferation and apoptosis. The relationship between circGprc5a, miR-1283, YAP1 and TEAD1 was analyzed using bioinformatic analysis and luciferase. The tumor changes in mice were detected by in vivo experiments. RESULTS CircGprc5a was highly expressed in liver cancer, and closely related poor survival of patients with liver cancer. Knockout of circGprc5a inhibited proliferation of HCC and induced apoptosis. CircGprc5a activated the YAP1/TEAD1 signaling pathway by acting as a sponge for miR-1283. Furthermore, overexpression of miR-1283 abolished the promotion of circGprc5a on HCC cells. Therefore, miR-1283 expression correlated negatively with circGprc5a expression yet positively with the expression of YAP1/TEAD1 in liver cancer. CONCLUSION CircGprc5a promoted the development of HCC by inhibiting the expression of miR-1283 and activating the YAP1/TEAD1 signaling pathway.
Collapse
Affiliation(s)
- Ye Lin
- Department of General Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou510080, People’s Republic of China
| | - Guanqun Huang
- Department of General Surgery, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou510700, People’s Republic of China
| | - Haosheng Jin
- Department of General Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou510080, People’s Republic of China
| | - Zhixiang Jian
- Department of General Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou510080, People’s Republic of China
| |
Collapse
|
121
|
Li J, Zhang S, Wan Y, Zhao Y, Shi J, Zhou Y, Cui Q. MISIM v2.0: a web server for inferring microRNA functional similarity based on microRNA-disease associations. Nucleic Acids Res 2020; 47:W536-W541. [PMID: 31069374 PMCID: PMC6602518 DOI: 10.1093/nar/gkz328] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 04/14/2019] [Accepted: 04/25/2019] [Indexed: 01/11/2023] Open
Abstract
MicroRNAs (miRNAs) are one class of important small non-coding RNA molecules and play critical roles in health and disease. Therefore, it is important and necessary to evaluate the functional relationship of miRNAs and then predict novel miRNA-disease associations. For this purpose, here we developed the updated web server MISIM (miRNA similarity) v2.0. Besides a 3-fold increase in data content compared with MISIM v1.0, MISIM v2.0 improved the original MISIM algorithm by implementing both positive and negative miRNA-disease associations. That is, the MISIM v2.0 scores could be positive or negative, whereas MISIM v1.0 only produced positive scores. Moreover, MISIM v2.0 achieved an algorithm for novel miRNA-disease prediction based on MISIM v2.0 scores. Finally, MISIM v2.0 provided network visualization and functional enrichment analysis for functionally paired miRNAs. The MISIM v2.0 web server is freely accessible at http://www.lirmed.com/misim/.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.,Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Shan Zhang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Yanping Wan
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Yingshu Zhao
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
| | - Jiangcheng Shi
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, China.,Sanbo Brain Institute, Sanbo Brain Hospital, Capital Medical University, Beijing 100093, China
| |
Collapse
|
122
|
Wang CC, Zhao Y, Chen X. Drug-pathway association prediction: from experimental results to computational models. Brief Bioinform 2020; 22:5835554. [PMID: 32393976 DOI: 10.1093/bib/bbaa061] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 03/16/2020] [Accepted: 03/26/2020] [Indexed: 12/14/2022] Open
Abstract
Effective drugs are urgently needed to overcome human complex diseases. However, the research and development of novel drug would take long time and cost much money. Traditional drug discovery follows the rule of one drug-one target, while some studies have demonstrated that drugs generally perform their task by affecting related pathway rather than targeting single target. Thus, the new strategy of drug discovery, namely pathway-based drug discovery, have been proposed. Obviously, identifying associations between drugs and pathways plays a key role in the development of pathway-based drug discovery. Revealing the drug-pathway associations by experiment methods would take much time and cost. Therefore, some computational models were established to predict potential drug-pathway associations. In this review, we first introduced the background of drug and the concept of drug-pathway associations. Then, some publicly accessible databases and web servers about drug-pathway associations were listed. Next, we summarized some state-of-the-art computational methods in the past years for inferring drug-pathway associations and divided these methods into three classes, namely Bayesian spare factor-based, matrix decomposition-based and other machine learning methods. In addition, we introduced several evaluation strategies to estimate the predictive performance of various computational models. In the end, we discussed the advantages and limitations of existing computational methods and provided some suggestions about the future directions of the data collection and the calculation models development.
Collapse
|
123
|
Zhang Y, Chen M, Cheng X, Wei H. MSFSP: A Novel miRNA-Disease Association Prediction Model by Federating Multiple-Similarities Fusion and Space Projection. Front Genet 2020; 11:389. [PMID: 32425980 PMCID: PMC7204399 DOI: 10.3389/fgene.2020.00389] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 03/27/2020] [Indexed: 12/11/2022] Open
Abstract
Growing evidences have indicated that microRNAs (miRNAs) play a significant role relating to many important bioprocesses; their mutations and disorders will cause the occurrence of various complex diseases. The prediction of miRNAs associated with underlying diseases via computational approaches is beneficial to identify biomarkers and discover specific medicine, which can greatly reduce the cost of diagnosis, cure, prognosis, and prevention of human diseases. However, how to further achieve a more reliable prediction of potential miRNA-disease associations with effective integration of different biological data is a challenge for researchers. In this study, we proposed a computational model by using a federated method of combined multiple-similarities fusion and space projection (MSFSP). MSFSP firstly fused the integrated disease similarity (composed of disease semantic similarity, disease functional similarity, and disease Hamming similarity) with the integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity, and miRNA Hamming similarity). Secondly, it constructed the weighted network of miRNA-disease associations from the experimentally verified Boolean network of miRNA-disease associations by using similarity networks. Finally, it calculated the prediction results by weighting miRNA space projection scores and the disease space projection scores. Leave-one-out cross-validation demonstrated that MSFSP has the distinguished predictive accuracy with area under the receiver operating characteristics curve (AUC) of 0.9613 better than that of five other existing models. In case studies, the predictive ability of MSFSP was further confirmed as 96 and 98% of the top 50 predictions for prostatic neoplasms and lung neoplasms were successfully validated by experimental evidences and supporting experimental evidences were also found for 100% of the top 50 predictions for isolated diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Hanyan Wei
- School of Pharmacy, Guilin Medical University, Guilin, China
| |
Collapse
|
124
|
Wang W, Lv H, Zhao Y, Liu D, Wang Y, Zhang Y. DLS: A Link Prediction Method Based on Network Local Structure for Predicting Drug-Protein Interactions. Front Bioeng Biotechnol 2020; 8:330. [PMID: 32391341 PMCID: PMC7193019 DOI: 10.3389/fbioe.2020.00330] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 03/25/2020] [Indexed: 12/22/2022] Open
Abstract
The studies on drug-protein interactions (DPIs) had significant for drug repositioning, drug discovery, and clinical medicine. The biochemical experimentation (in vitro) requires a long time and high cost to be confirmed because it is difficult to estimate. Therefore, a feasible solution is to predict DPIs efficiently with computers. We propose a link prediction method based on drug-protein interaction (DPI) local structural similarity (DLS) for predicting the DPIs. The DLS method combines link prediction and binary network structure to predict DPIs. The ten-fold cross-validation method was applied in the experiment. After comparing the predictive capability of DLS with the improved similarity-based network prediction method, the results of DLS on the test set are significantly better. Moreover, several candidate proteins were predicted for three approved drugs, namely captopril, desferrioxamine and losartan, and these predictions are further validated by the literature. In addition, the combination of the Common Neighborhood (CN) method and the DLS method provides a new idea for the integrated application of the link prediction method.
Collapse
Affiliation(s)
- Wei Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality, Xinxiang, China
| | - Hehe Lv
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Yuan Zhao
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Yongqing Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality, Xinxiang, China
| | - Yu Zhang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| |
Collapse
|
125
|
Ji BY, You ZH, Cheng L, Zhou JR, Alghazzawi D, Li LP. Predicting miRNA-disease association from heterogeneous information network with GraRep embedding model. Sci Rep 2020; 10:6658. [PMID: 32313121 PMCID: PMC7170854 DOI: 10.1038/s41598-020-63735-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 03/16/2020] [Indexed: 12/27/2022] Open
Abstract
In recent years, accumulating evidences have shown that microRNA (miRNA) plays an important role in the exploration and treatment of diseases, so detection of the associations between miRNA and disease has been drawn more and more attentions. However, traditional experimental methods have the limitations of high cost and time- consuming, a computational method can help us more systematically and effectively predict the potential miRNA-disease associations. In this work, we proposed a novel network embedding-based heterogeneous information integration method to predict miRNA-disease associations. More specifically, a heterogeneous information network is constructed by combining the known associations among lncRNA, drug, protein, disease, and miRNA. After that, the network embedding method Learning Graph Representations with Global Structural Information (GraRep) is employed to learn embeddings of nodes in heterogeneous information network. In this way, the embedding representations of miRNA and disease are integrated with the attribute information of miRNA and disease (e.g. miRNA sequence information and disease semantic similarity) to represent miRNA-disease association pairs. Finally, the Random Forest (RF) classifier is used for predicting potential miRNA-disease associations. Under the 5-fold cross validation, our method obtained 85.11% prediction accuracy with 80.41% sensitivity at the AUC of 91.25%. In addition, in case studies of three major Human diseases, 45 (Colon Neoplasms), 42 (Breast Neoplasms) and 44 (Esophageal Neoplasms) of top-50 predicted miRNAs are respectively verified by other miRNA-disease association databases. In conclusion, the experimental results suggest that our method can be a powerful and useful tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Bo-Ya Ji
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Li Cheng
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.
| | - Ji-Ren Zhou
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Daniyal Alghazzawi
- Department of Information Systems, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Li-Ping Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
126
|
Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression. Comput Biol Chem 2020; 85:107200. [DOI: 10.1016/j.compbiolchem.2020.107200] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 01/04/2020] [Accepted: 01/05/2020] [Indexed: 12/19/2022]
|
127
|
Fan Y, Cui J, Zhu Q. Heterogeneous graph inference based on similarity network fusion for predicting lncRNA-miRNA interaction. RSC Adv 2020; 10:11634-11642. [PMID: 35496629 PMCID: PMC9050493 DOI: 10.1039/c9ra11043g] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 03/14/2020] [Indexed: 12/28/2022] Open
Abstract
LncRNA and miRNA are two non-coding RNA types that are popular in current research. LncRNA interacts with miRNA to regulate gene transcription, further affecting human health and disease. Accurate identification of lncRNA-miRNA interactions contributes to the in-depth study of the biological functions and mechanisms of non-coding RNA. However, relying on biological experiments to obtain interaction information is time-consuming and expensive. Considering the rapid accumulation of gene information and the few computational methods, it is urgent to supplement the effective computational models to predict lncRNA-miRNA interactions. In this work, we propose a heterogeneous graph inference method based on similarity network fusion (SNFHGILMI) to predict potential lncRNA-miRNA interactions. First, we calculated multiple similarity data, including lncRNA sequence similarity, miRNA sequence similarity, lncRNA Gaussian nuclear similarity, and miRNA Gaussian nuclear similarity. Second, the similarity network fusion method was employed to integrate the data and get the similarity network of lncRNA and miRNA. Then, we constructed a bipartite network by combining the known interaction network and similarity network of lncRNA and miRNA. Finally, the heterogeneous graph inference method was introduced to construct a prediction model. On the real dataset, the model SNFHGILMI achieved AUC of 0.9501 and 0.9426 ± 0.0035 based on LOOCV and 5-fold cross validation, respectively. Furthermore, case studies also demonstrate that SNFHGILMI is a high-performance prediction method that can accurately predict new lncRNA-miRNA interactions. The Matlab code and readme file of SNFHGILMI can be downloaded from https://github.com/cj-DaSE/SNFHGILMI.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer and Information Security, Guilin University of Electronic Technology Guilin 541004 China
| | - Juan Cui
- School of Computer and Information Security, Guilin University of Electronic Technology Guilin 541004 China
| | - QingQi Zhu
- School of Computer and Information Security, Guilin University of Electronic Technology Guilin 541004 China
| |
Collapse
|
128
|
Xiao Q, Zhang N, Luo J, Dai J, Tang X. Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs. Brief Bioinform 2020; 22:2043-2057. [PMID: 32186712 DOI: 10.1093/bib/bbaa028] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 02/16/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.
Collapse
|
129
|
Yan C, Wu FX, Wang J, Duan G. PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences. BMC Bioinformatics 2020; 21:111. [PMID: 32183740 PMCID: PMC7079416 DOI: 10.1186/s12859-020-3426-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/21/2020] [Indexed: 11/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a kind of small noncoding RNA molecules that are direct posttranscriptional regulations of mRNA targets. Studies have indicated that miRNAs play key roles in complex diseases by taking part in many biological processes, such as cell growth, cell death and so on. Therefore, in order to improve the effectiveness of disease diagnosis and treatment, it is appealing to develop advanced computational methods for predicting the essentiality of miRNAs. Result In this study, we propose a method (PESM) to predict the miRNA essentiality based on gradient boosting machines and miRNA sequences. First, PESM extracts the sequence and structural features of miRNAs. Then it uses gradient boosting machines to predict the essentiality of miRNAs. We conduct the 5-fold cross-validation to assess the prediction performance of our method. The area under the receiver operating characteristic curve (AUC), F-measure and accuracy (ACC) are used as the metrics to evaluate the prediction performance. We also compare PESM with other three competing methods which include miES, Gaussian Naive Bayes and Support Vector Machine. Conclusion The results of experiments show that PESM achieves the better prediction performance (AUC: 0.9117, F-measure: 0.8572, ACC: 0.8516) than other three computing methods. In addition, the relative importance of all features also further shows that newly added features can be helpful to improve the prediction performance of methods.
Collapse
Affiliation(s)
- Cheng Yan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| | - Guihua Duan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| |
Collapse
|
130
|
Xie W, Luo J, Pan C, Liu Y. SG-LSTM-FRAME: a computational frame using sequence and geometrical information via LSTM to predict miRNA-gene associations. Brief Bioinform 2020; 22:2032-2042. [PMID: 32181478 DOI: 10.1093/bib/bbaa022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 02/10/2020] [Accepted: 02/11/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION MircroRNAs (miRNAs) regulate target genes and are responsible for lethal diseases such as cancers. Accurately recognizing and identifying miRNA and gene pairs could be helpful in deciphering the mechanism by which miRNA affects and regulates the development of cancers. Embedding methods and deep learning methods have shown their excellent performance in traditional classification tasks in many scenarios. But not so many attempts have adapted and merged these two methods into miRNA-gene relationship prediction. Hence, we proposed a novel computational framework. We first generated representational features for miRNAs and genes using both sequence and geometrical information and then leveraged a deep learning method for the associations' prediction. RESULTS We used long short-term memory (LSTM) to predict potential relationships and proved that our method outperformed other state-of-the-art methods. Results showed that our framework SG-LSTM got an area under curve of 0.94 and was superior to other methods. In the case study, we predicted the top 10 miRNA-gene relationships and recommended the top 10 potential genes for hsa-miR-335-5p for SG-LSTM-core. We also tested our model using a larger dataset, from which 14 668 698 miRNA-gene pairs were predicted. The top 10 unknown pairs were also listed. AVAILABILITY Our work can be download in https://github.com/Xshelton/SG_LSTM. CONTACT luojiawei@hnu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Weidun Xie
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Chu Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Ying Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| |
Collapse
|
131
|
Liu H, Ren G, Chen H, Liu Q, Yang Y, Zhao Q. Predicting lncRNA–miRNA interactions based on logistic matrix factorization with neighborhood regularized. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105261] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
132
|
Gao Z, Wang YT, Wu QW, Ni JC, Zheng CH. Graph regularized L 2,1-nonnegative matrix factorization for miRNA-disease association prediction. BMC Bioinformatics 2020; 21:61. [PMID: 32070280 PMCID: PMC7029547 DOI: 10.1186/s12859-020-3409-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 02/11/2020] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers. RESULTS Here, we present a computational framework based on graph Laplacian regularized L2, 1-nonnegative matrix factorization (GRL2, 1-NMF) for inferring possible human disease-connected miRNAs. First, manually validated disease-connected microRNAs were integrated, and microRNA functional similarity information along with two kinds of disease semantic similarities were calculated. Next, we measured Gaussian interaction profile (GIP) kernel similarities for both diseases and microRNAs. Then, we adopted a preprocessing step, namely, weighted K nearest known neighbours (WKNKN), to decrease the sparsity of the miRNA-disease association matrix network. Finally, the GRL2,1-NMF framework was used to predict links between microRNAs and diseases. CONCLUSIONS The new method (GRL2, 1-NMF) achieved AUC values of 0.9280 and 0.9276 in global leave-one-out cross validation (global LOOCV) and five-fold cross validation (5-CV), respectively, showing that GRL2, 1-NMF can powerfully discover potential disease-related miRNAs, even if there is no known associated disease.
Collapse
Affiliation(s)
- Zhen Gao
- School of Software, Qufu Normal University, Qufu, 273165, China
| | - Yu-Tian Wang
- School of Software, Qufu Normal University, Qufu, 273165, China
| | - Qing-Wen Wu
- School of Software, Qufu Normal University, Qufu, 273165, China
| | - Jian-Cheng Ni
- School of Software, Qufu Normal University, Qufu, 273165, China.
| | - Chun-Hou Zheng
- School of Software, Qufu Normal University, Qufu, 273165, China.
| |
Collapse
|
133
|
Wang S, Li J. Modular within and between score for drug response prediction in cancer cell lines. Mol Omics 2020; 16:31-38. [PMID: 31802092 DOI: 10.1039/c9mo00162j] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Drug response prediction in cancer cell lines is vital to discover new anticancer drugs. However, it's still a challenging task to accurately predict drug responses in cancer cell lines. In this study, we presented a novel computational approach, named as MSDRP (modular within and between score for drug response prediction), to predict drug responses in cell lines. The method is based on a constructed heterogeneous drug-cell line network with multiple information. Compared with other state-of-the-art methods, MSDRP acquired better predictive performance, and identified potential associations between drugs and cell lines, which have been confirmed by the published literature. The source code of MSDRP is freely available at https://github.com/shimingwang1994/MSDRP.git.
Collapse
Affiliation(s)
- Shiming Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | | |
Collapse
|
134
|
Peng LH, Zhou LQ, Chen X, Piao X. A Computational Study of Potential miRNA-Disease Association Inference Based on Ensemble Learning and Kernel Ridge Regression. Front Bioeng Biotechnol 2020; 8:40. [PMID: 32117922 PMCID: PMC7015868 DOI: 10.3389/fbioe.2020.00040] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/17/2020] [Indexed: 12/11/2022] Open
Abstract
As increasing experimental studies have shown that microRNAs (miRNAs) are closely related to multiple biological processes and the prevention, diagnosis and treatment of human diseases, a growing number of researchers are focusing on the identification of associations between miRNAs and diseases. Identifying such associations purely via experiments is costly and demanding, which prompts researchers to develop computational methods to complement the experiments. In this paper, a novel prediction model named Ensemble of Kernel Ridge Regression based MiRNA-Disease Association prediction (EKRRMDA) was developed. EKRRMDA obtained features of miRNAs and diseases by integrating the disease semantic similarity, the miRNA functional similarity and the Gaussian interaction profile kernel similarity for diseases and miRNAs. Under the computational framework that utilized ensemble learning and feature dimensionality reduction, multiple base classifiers that combined two Kernel Ridge Regression classifiers from the miRNA side and disease side, respectively, were obtained based on random selection of features. Then average strategy for these base classifiers was adopted to obtain final association scores of miRNA-disease pairs. In the global and local leave-one-out cross validation, EKRRMDA attained the AUCs of 0.9314 and 0.8618, respectively. Moreover, the model’s average AUC with standard deviation in 5-fold cross validation was 0.9275 ± 0.0008. In addition, we implemented three different types of case studies on predicting miRNAs associated with five important diseases. As a result, there were 90% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 86% (Lymphoma), 98% (Lung Neoplasms), and 96% (Breast Neoplasms) of the top 50 predicted miRNAs verified to have associations with these diseases.
Collapse
Affiliation(s)
- Li-Hong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Li-Qian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
135
|
Ha J, Park C, Park C, Park S. IMIPMF: Inferring miRNA-disease interactions using probabilistic matrix factorization. J Biomed Inform 2020; 102:103358. [DOI: 10.1016/j.jbi.2019.103358] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 11/11/2019] [Accepted: 12/12/2019] [Indexed: 12/09/2022]
|
136
|
Li J, Wang S, Chen Z, Wang Y. A Bipartite Network Module-Based Project to Predict Pathogen-Host Association. Front Genet 2020; 10:1357. [PMID: 32038713 PMCID: PMC6992693 DOI: 10.3389/fgene.2019.01357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/11/2019] [Indexed: 12/23/2022] Open
Abstract
Pathogen-host interactions play an important role in understanding the mechanism by which a pathogen can infect its host. Some approaches for predicting pathogen-host association have been developed, but prediction accuracy is still low. In this paper, we propose a bipartite network module-based approach to improve prediction accuracy. First, a bipartite network with pathogens and hosts is constructed. Next, pathogens and hosts are divided into different modules respectively. Then, modular information on the pathogens and hosts is added into a bipartite network projection model and the association scores between pathogens and hosts are calculated. Finally, leave-one-out cross-validation is used to estimate the performance of the proposed method. Experimental results show that the proposed method performs better in predicting pathogen-host association than other methods, and some potential pathogen-host associations with higher prediction scores are also confirmed by the results of biological experiments in the publically available literature.
Collapse
Affiliation(s)
- Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | | | | | | |
Collapse
|
137
|
Wu M, Yang Y, Wang H, Ding J, Zhu H, Xu Y. IMPMD: An Integrated Method for Predicting Potential Associations Between miRNAs and Diseases. Curr Genomics 2020; 20:581-591. [PMID: 32581646 PMCID: PMC7290057 DOI: 10.2174/1389202920666191023090215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 10/16/2019] [Indexed: 01/06/2023] Open
Abstract
Background With the rapid development of biological research, microRNAs (miRNAs) have increasingly attracted worldwide attention. The increasing biological studies and scientific experiments have proven that miRNAs are related to the occurrence and development of a large number of key biological processes which cause complex human diseases. Thus, identifying the association between miRNAs and disease is helpful to diagnose the diseases. Although some studies have found considerable associations between miRNAs and diseases, there are still a lot of associations that need to be identified. Experimental methods to uncover miRNA-disease associations are time-consuming and expensive. Therefore, effective computational methods are urgently needed to predict new associations. Methodology In this work, we propose an integrated method for predicting potential associations between miRNAs and diseases (IMPMD). The enhanced similarity for miRNAs is obtained by combination of functional similarity, gaussian similarity and Jaccard similarity. To diseases, it is obtained by combination of semantic similarity, gaussian similarity and Jaccard similarity. Then, we use these two enhanced similarities to construct the features and calculate cumulative score to choose robust features. Finally, the general linear regression is applied to assign weights for Support Vector Machine, K-Nearest Neighbor and Logistic Regression algorithms. Results IMPMD obtains AUC of 0.9386 in 10-fold cross-validation, which is better than most of the previous models. To further evaluate our model, we implement IMPMD on two types of case studies for lung cancer and breast cancer. 49 (Lung Cancer) and 50 (Breast Cancer) out of the top 50 related miRNAs are validated by experimental discoveries. Conclusion We built a software named IMPMD which can be freely downloaded from https://github.com/Sunmile/IMPMD.
Collapse
Affiliation(s)
- Meiqi Wu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Yingxi Yang
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Hui Wang
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Jun Ding
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Huan Zhu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Yan Xu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| |
Collapse
|
138
|
Li J, Zhang S, Liu T, Ning C, Zhang Z, Zhou W. Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction. Bioinformatics 2020; 36:2538-2546. [DOI: 10.1093/bioinformatics/btz965] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 12/17/2019] [Accepted: 12/31/2019] [Indexed: 12/26/2022] Open
Abstract
AbstractMotivationPredicting the association between microRNAs (miRNAs) and diseases plays an import role in identifying human disease-related miRNAs. As identification of miRNA-disease associations via biological experiments is time-consuming and expensive, computational methods are currently used as effective complements to determine the potential associations between disease and miRNA.ResultsWe present a novel method of neural inductive matrix completion with graph convolutional network (NIMCGCN) for predicting miRNA-disease association. NIMCGCN first uses graph convolutional networks to learn miRNA and disease latent feature representations from the miRNA and disease similarity networks. Then, learned features were input into a novel neural inductive matrix completion (NIMC) model to generate an association matrix completion. The parameters of NIMCGCN were learned based on the known miRNA-disease association data in a supervised end-to-end way. We compared the proposed method with other state-of-the-art methods. The area under the receiver operating characteristic curve results showed that our method is significantly superior to existing methods. Furthermore, 50, 47 and 48 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma and kidney cancer, were verified using experimental literature. Finally, 100% prediction accuracy was achieved when breast cancer was used as a case study to evaluate the ability of NIMCGCN for predicting a new disease without any known related miRNAs.Availability and implementationhttps://github.com/ljatynu/NIMCGCN/Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Li
- School of Software, Yunnan University, Kunming 650091, China
| | - Sai Zhang
- School of Software, Yunnan University, Kunming 650091, China
| | - Tao Liu
- School of Software, Yunnan University, Kunming 650091, China
| | - Chenxi Ning
- School of Software, Yunnan University, Kunming 650091, China
| | - Zhuoxuan Zhang
- School of Software, Yunnan University, Kunming 650091, China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming 650091, China
| |
Collapse
|
139
|
Potential miRNA-disease association prediction based on kernelized Bayesian matrix factorization. Genomics 2020; 112:809-819. [DOI: 10.1016/j.ygeno.2019.05.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/09/2019] [Accepted: 05/24/2019] [Indexed: 12/19/2022]
|
140
|
Qu J, Zhao Y, Zhang L, Cai SB, Ming Z, Wang CC. Computational Models for Self-Interacting Proteins Prediction. Protein Pept Lett 2019; 27:392-399. [PMID: 31880240 DOI: 10.2174/0929866527666191227141713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 11/22/2022]
Abstract
Self-Interacting Proteins (SIPs), whose two or more copies can interact with each other, have significant roles in cellular functions and evolution of Protein Interaction Networks (PINs). Knowing whether a protein can act on itself is important to understand its functions. Previous studies on SIPs have focused on their structures and functions, while their whole properties are less emphasized. Not surprisingly, identifying SIPs is one of the most important works in biomedical research, which will help to understanding the function and mechanism of proteins. It is worth noting that high throughput methods can be used for SIPs prediction, but can be costly, time consuming and challenging. Therefore, it is urgent to design computational models for the identification of SIPs. In this review, the concept and function of SIPs were introduced in detail. We further introduced SIPs data and some excellent computational models that have been designed for SIPs prediction. Specially, the most existing approaches were developed based on machine learning through carrying out different extract feature methods. Finally, we discussed several difficult problems in developing computational models for SIPs prediction.
Collapse
Affiliation(s)
- Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Shu-Bin Cai
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Zhong Ming
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
141
|
Associating lncRNAs with small molecules via bilevel optimization reveals cancer-related lncRNAs. PLoS Comput Biol 2019; 15:e1007540. [PMID: 31877126 PMCID: PMC6948815 DOI: 10.1371/journal.pcbi.1007540] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 01/08/2020] [Accepted: 11/12/2019] [Indexed: 12/28/2022] Open
Abstract
Long noncoding RNA (lncRNA) transcripts have emerging impacts in cancer studies, which suggests their potential as novel therapeutic agents. However, the molecular mechanism behind their treatment effects is still unclear. Here, we designed a computational model to Associate LncRNAs with Anti-Cancer Drugs (ALACD) based on a bilevel optimization model, which optimized the gene signature overlap in the upper level and imputed the missing lncRNA-gene association in the lower level. ALACD predicts genes coexpressed with lncRNAs mean while matching drug’s gene signatures. This model allows us to borrow the target gene information of small molecules to understand the mechanisms of action of lncRNAs and their roles in cancer. The ALACD model was systematically applied to the 10 cancer types in The Cancer Genome Atlas (TCGA) that had matched lncRNA and mRNA expression data. Cancer type-specific lncRNAs and associated drugs were identified. These lncRNAs show significantly different expression levels in cancer patients. Follow-up functional and molecular pathway analysis suggest the gene signatures bridging drugs and lncRNAs are closely related to cancer development. Importantly, patient survival information and evidence from the literature suggest that the lncRNAs and drug-lncRNA associations identified by the ALACD model can provide an alternative choice for cancer targeting treatment and potential cancer pognostic biomarkers. The ALACD model is freely available at https://github.com/wangyc82/ALACD-v1. LncRNAs are RNA transcripts that are longer than 200 bp and do not encode proteins. Recent experimental studies have indicated the crucial role of lncRNAs in cancer. We proposed a computational model, ALACD, to understand a lncRNA’s molecular mechanism by associating it with a drug through the drug’s target genes. ALACD reveals lncRNAs, the associated anti-cancer drug, and the induced gene signatures that are involved in the regulation of cancer. Furthermore, these cancer-related lncRNAs are differentially expressed in cancer patients and closely associated with patient survival.
Collapse
|
142
|
MiRNA-disease interaction prediction based on kernel neighborhood similarity and multi-network bidirectional propagation. BMC Med Genomics 2019; 12:185. [PMID: 31865912 PMCID: PMC6927119 DOI: 10.1186/s12920-019-0622-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Studies have shown that miRNAs are functionally associated with the development of many human diseases, but the roles of miRNAs in diseases and their underlying molecular mechanisms have not been fully understood. The research on miRNA-disease interaction has received more and more attention. Compared with the complexity and high cost of biological experiments, computational methods can rapidly and efficiently predict the potential miRNA-disease interaction and can be used as a beneficial supplement to experimental methods. Results In this paper, we proposed a novel computational model of kernel neighborhood similarity and multi-network bidirectional propagation (KNMBP) for miRNA-disease interaction prediction, especially for new miRNAs and new diseases. First, we integrated multiple data sources of diseases and miRNAs, respectively, to construct a novel disease semantic similarity network and miRNA functional similarity network. Secondly, based on the modified miRNA-disease interactions, we use the kernel neighborhood similarity algorithm to calculate the disease kernel neighborhood similarity and the miRNA kernel neighborhood similarity. Finally, we utilize bidirectional propagation algorithm to predict the miRNA-disease interaction scores based on the integrated disease similarity network and miRNA similarity network. As a result, the AUC value of 5-fold cross validation for all interactions by KNMBP is 0.93126 based on the commonly used dataset, and the AUC values for all interactions, for all miRNAs, for all disease is 0.93795、0.86363、0.86937 based on another dataset extracted by ourselves, which are higher than other state-of-the-art methods. In addition, our model has good parameter robustness. The case study further demonstrated the predictive performance of the model for novel miRNA-disease interactions. Conclusions Our KNMBP algorithm efficiently integrates multiple omics data from miRNAs and diseases to stably and efficiently predict potential miRNA-disease interactions. It is anticipated that KNMBP would be a useful tool in biomedical research.
Collapse
|
143
|
An improved random forest-based computational model for predicting novel miRNA-disease associations. BMC Bioinformatics 2019; 20:624. [PMID: 31795954 PMCID: PMC6889672 DOI: 10.1186/s12859-019-3290-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/21/2019] [Indexed: 01/29/2023] Open
Abstract
Background A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. Results Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model’s ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. Conclusions Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.
Collapse
|
144
|
Prediction of potential miRNA-disease associations using matrix decomposition and label propagation. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.104963] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
145
|
Zhao Y, Chen X, Yin J, Qu J. SNMFSMMA: using symmetric nonnegative matrix factorization and Kronecker regularized least squares to predict potential small molecule-microRNA association. RNA Biol 2019; 17:281-291. [PMID: 31739716 DOI: 10.1080/15476286.2019.1694732] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Accumulating studies have shown that microRNAs (miRNAs) could be used as targets of small-molecule (SM) drugs to treat diseases. In recent years, researchers have proposed many computational models to reveal miRNA-SM associations due to the huge cost of experimental methods. Considering the shortcomings of the previous models, such as the prediction accuracy of some models is low or some cannot be applied for new SMs (miRNAs), we developed a novel model named Symmetric Nonnegative Matrix Factorization for Small Molecule-MiRNA Association prediction (SNMFSMMA). Different from some models directly applying the integrated similarities, SNMFSMMA first performed matrix decomposition on the integrated similarity matrixes, and calculated the Kronecker product of the new integrated similarity matrixes to obtain the SM-miRNA pair similarity. Further, we applied regularized least square to obtain the mapping function of the SM-miRNA pairs to the associated probabilities by minimizing the objective function. On the basis of Dataset 1 and 2 extracted from SM2miR v1.0 database, we implemented global leave-one-out cross validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV and 5-fold cross-validation to evaluate the prediction performance. Finally, the AUC values obtained by SNMFSMMA in these validation reached 0.9711 (0.8895), 0.9698 (0.8884), 0.8329 (0.7651) and 0.9644 ± 0.0035 (0.8814 ± 0.0033) based on Dataset 1 (Dataset 2), respectively. In the first case study, 5 of the top 10 associations predicted were confirmed. In the second, 7 and 8 of the top 10 predicted miRNAs related with 5-FU and 5-Aza-2'-deoxycytidine were confirmed. These results demonstrated the reliable predictive power of SNMFSMMA.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
146
|
Wan H, Li JM, Ding H, Lin SX, Tu SQ, Tian XH, Hu JP, Chang S. An Overview of Computational Tools of Nucleic Acid Binding Site Prediction for Site-specific Proteins and Nucleases. Protein Pept Lett 2019; 27:370-384. [PMID: 31746287 DOI: 10.2174/0929866526666191028162302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 05/24/2019] [Accepted: 09/24/2019] [Indexed: 12/26/2022]
Abstract
Understanding the interaction mechanism of proteins and nucleic acids is one of the most fundamental problems for genome editing with engineered nucleases. Due to some limitations of experimental investigations, computational methods have played an important role in obtaining the knowledge of protein-nucleic acid interaction. Over the past few years, dozens of computational tools have been used for identification of nucleic acid binding site for site-specific proteins and design of site-specific nucleases because of their significant advantages in genome editing. Here, we review existing widely-used computational tools for target prediction of site-specific proteins as well as off-target prediction of site-specific nucleases. This article provides a list of on-line prediction tools according to their features followed by the description of computational methods used by these tools, which range from various sequence mapping algorithms (like Bowtie, FetchGWI and BLAST) to different machine learning methods (such as Support Vector Machine, hidden Markov models, Random Forest, elastic network and deep neural networks). We also make suggestions on the further development in improving the accuracy of prediction methods. This survey will provide a reference guide for computational biologists working in the field of genome editing.
Collapse
Affiliation(s)
- Hua Wan
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ming Li
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Huang Ding
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Shuo-Xin Lin
- Department of Electrical and Computer Engineering, James Clark School of Engineering, University of Maryland, College Park, MD 20742, United States
| | - Shu-Qin Tu
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Xu-Hong Tian
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ping Hu
- College of Pharmacy and Biological Engineering, Sichuan Industrial Institute of Antibiotics, Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, Antibiotics Research and Re-Evaluation Key Laboratory of Sichuan Province, Chengdu University, Chengdu 610106, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
147
|
Lei X, Tie J. Prediction of disease-related metabolites using bi-random walks. PLoS One 2019; 14:e0225380. [PMID: 31730648 PMCID: PMC6857945 DOI: 10.1371/journal.pone.0225380] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Accepted: 11/04/2019] [Indexed: 12/25/2022] Open
Abstract
Metabolites play a significant role in various complex human disease. The exploration of the relationship between metabolites and diseases can help us to better understand the underlying pathogenesis. Several network-based methods have been used to predict the association between metabolite and disease. However, some methods ignored hierarchical differences in disease network and failed to work in the absence of known metabolite-disease associations. This paper presents a bi-random walks based method for disease-related metabolites prediction, called MDBIRW. First of all, we reconstruct the disease similarity network and metabolite functional similarity network by integrating Gaussian Interaction Profile (GIP) kernel similarity of diseases and GIP kernel similarity of metabolites, respectively. Then, the bi-random walks algorithm is executed on the reconstructed disease similarity network and metabolite functional similarity network to predict potential disease-metabolite associations. At last, MDBIRW achieves reliable performance in leave-one-out cross validation (AUC of 0.910) and 5-fold cross validation (AUC of 0.924). The experimental results show that our method outperforms other existing methods for predicting disease-related metabolites.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an China
| | - Jiaojiao Tie
- School of Computer Science, Shaanxi Normal University, Xi’an China
| |
Collapse
|
148
|
Li S, Xie M, Liu X. A Novel Approach Based on Bipartite Network Recommendation and KATZ Model to Predict Potential Micro-Disease Associations. Front Genet 2019; 10:1147. [PMID: 31803235 PMCID: PMC6873782 DOI: 10.3389/fgene.2019.01147] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/21/2019] [Indexed: 12/24/2022] Open
Abstract
Accumulating evidence indicates that the microbes colonizing human bodies have crucial effects on human health and the discovery of disease-related microbes will promote the discovery of biomarkers and drugs for the prevention, diagnosis, treatment, and prognosis of diseases. However clinical experiments of disease-microbe associations are time-consuming, laborious and expensive, and there are few methods for predicting potential microbe-disease association. Therefore, developing effective computational models utilizing the accumulated public data of clinically validated microbe-disease associations to identify novel disease-microbe associations is of practical importance. We propose a novel method based on the KATZ model and Bipartite Network Recommendation Algorithm (KATZBNRA) to discover potential associations between microbes and diseases. We calculate the Gaussian interaction profile kernel similarity of diseases and microbes based on validated disease-microbe associations. Then, we construct a bipartite graph and execute a bipartite network recommendation algorithm. Finally, we integrate the disease similarity, microbe similarity and bipartite network recommendation score to obtain the final score, which is used to infer whether there are some novel disease-microbe interactions. To evaluate the predictive power of KATZBNRA, we tested it with the walk length 2 using global leave-one-out cross validation (LOOV), two-fold and five-fold cross validations, with AUCs of 0.9098, 0.8463 and 0.8969, respectively. The test results also show that KATZBNRA is more accurate than two recent similar methods KATZHMDA and BNPMDA.
Collapse
Affiliation(s)
- Shiru Li
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Xinqiu Liu
- Hunan Vocational College of Engineering, Changsha, China
| |
Collapse
|
149
|
Yi HC, You ZH, Guo ZH. Construction and Analysis of Molecular Association Network by Combining Behavior Representation and Node Attributes. Front Genet 2019; 10:1106. [PMID: 31788002 PMCID: PMC6854842 DOI: 10.3389/fgene.2019.01106] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 10/15/2019] [Indexed: 11/13/2022] Open
Abstract
A key aim of post-genomic biomedical research is to systematically understand and model complex biomolecular activities based on a systematic perspective. Biomolecular interactions are widespread and interrelated, multiple biomolecules coordinate to sustain life activities, any disturbance of these complex connections can lead to abnormal of life activities or complex diseases. However, many existing researches usually only focus on individual intermolecular interactions. In this work, we revealed, constructed, and analyzed a large-scale molecular association network of multiple biomolecules in human by integrating associations among lncRNAs, miRNAs, proteins, drugs, and diseases, in which various associations are interconnected and any type of associations can be predicted. We propose Molecular Association Network (MAN)–High-Order Proximity preserved Embedding (HOPE), a novel network representation learning based method to fully exploit latent feature of biomolecules to accurately predict associations between molecules. More specifically, network representation learning algorithm HOPE was applied to learn behavior feature of nodes in the association network. Attribute features of nodes were also adopted. Then, a machine learning model CatBoost was trained to predict potential association between any nodes. The performance of our method was evaluated under five-fold cross validation. A case study to predict miRNA-disease associations was also conducted to verify the prediction capability. MAN-HOPE achieves high accuracy of 93.3% and area under the receiver operating characteristic curve of 0.9793. The experimental results demonstrate the novelty of our systematic understanding of the intermolecular associations, and enable systematic exploration of the landscape of molecular interactions that shape specialized cellular functions.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Zhen-Hao Guo
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| |
Collapse
|
150
|
Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform 2019; 21:1825-1836. [PMID: 31860715 DOI: 10.1093/bib/bbz120] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 08/12/2019] [Accepted: 08/21/2019] [Indexed: 12/20/2022] Open
Abstract
The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.
Collapse
Affiliation(s)
- Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|