1
|
Hu X, Sun H, Shan L, Ma C, Quan H, Zhang Y, Zhang J, Fan Z, Tang Y, Deng L. Unraveling Disease-Associated PIWI-Interacting RNAs with a Contrastive Learning Methods. J Chem Inf Model 2025. [PMID: 40263714 DOI: 10.1021/acs.jcim.5c00173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025]
Abstract
PIWI-interacting RNAs (piRNAs) are a class of small, non-coding RNAs predominantly expressed in the germ cells of animals and play a crucial role in maintaining genomic integrity, mediating transposon suppression, and ensuring gene stability. Beyond their functions in reproductive cells, piRNAs also play roles in various human diseases, including cancer, suggesting their potential as significant biomarkers critical for disease diagnosis and treatment. Wet-lab methods to identify piRNA-disease associations require substantial resources and are often hit-or-miss. With advancements in computational technologies, an increasing number of researchers are employing computational methods to efficiently predict potential piRNA-disease associations. The sparsity of data in piRNA-disease association studies significantly limits model performance improvement. In this study, we propose a novel computational model, iPiDA_CL, to predict potential piRNA-disease associations through contrastive learning methods, which do not require negative samples. The model represents piRNA-disease association pairs as a bipartite graph and computes the initial embeddings of piRNAs and diseases using Gaussian kernel similarity, with features updated via LightGCN. Based on the siamese network framework, iPiDA_CL constructs online and target networks and employs data augmentation in the target network to build a contrastive learning objective that optimizes model parameters without introducing negative samples. Finally, cross-prediction methods are used to calculate specific piRNA-disease association scores. A series of experimental results demonstrate that iPiDA_CL surpasses state-of-the-art methods in both performance and computational efficiency. The application of iPiDA_CL to the miRNA-disease association dataset underscores its versatility across various ncRNA-disease association task. Furthermore, a case study highlights iPiDA_CL as an efficient and promising tool for predicting piRNA-disease associations.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Hao Sun
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Linchao Shan
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Chenxi Ma
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Hanming Quan
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Yuanpeng Zhang
- School of software, Xinjiang University, Urumqi 830049, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego, California 92161, United States
| | - Ziyu Fan
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| | - Yongjun Tang
- Department of Pediatrics, Xiangya Hospital, Central South University, Changsha 410083, China
| | - Lei Deng
- School of Computer Science and Engineering, Center South University, Changsha 410083, China
| |
Collapse
|
2
|
Cao X, Lu P. DCSGMDA: A dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations. Comput Biol Chem 2024; 113:108201. [PMID: 39255626 DOI: 10.1016/j.compbiolchem.2024.108201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/17/2024] [Accepted: 08/31/2024] [Indexed: 09/12/2024]
Abstract
Numerous studies have shown that microRNAs (miRNAs) play a key role in human diseases as critical biomarkers. Its abnormal expression is often accompanied by the emergence of specific diseases. Therefore, studying the relationship between miRNAs and diseases can deepen the insights of their pathogenesis, grasp the process of disease onset and development, and promote drug research of specific diseases. However, many undiscovered relationships between miRNAs and diseases remain, significantly limiting research on miRNA-disease correlations. To explore more potential correlations, we propose a dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations (DCSGMDA). Firstly, we constructed similarity networks for miRNAs and diseases, as well as an association relationship network. Secondly, potential features were fully mined using stacked deep learning and gradient decomposition networks, along with dual-channel convolutional neural networks. Finally, correlations were scored by a multilayer perceptron. We performed 5-fold and 10-fold cross-validation experiments on DCSGMDA using two datasets based on the Human MicroRNA Disease Database (HMDD). Additionally, parametric, ablation, and comparative experiments, along with case studies, were conducted. The experimental results demonstrate that DCSGMDA performs well in predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Xu Cao
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| | - Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| |
Collapse
|
3
|
Ning Q, Zhao Y, Gao J, Chen C, Yin M. Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2531-2542. [PMID: 39475747 DOI: 10.1109/tcbb.2024.3485788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.
Collapse
|
4
|
Xuan P, Wang W, Cui H, Wang S, Nakaguchi T, Zhang T. Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction. J Chem Inf Model 2024; 64:6662-6675. [PMID: 39112431 DOI: 10.1021/acs.jcim.4c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying new relevant long noncoding RNAs (lncRNAs) for various human diseases can facilitate the exploration of the causes and progression of these diseases. Recently, several graph inference methods have been proposed to predict disease-related lncRNAs by exploiting the topological structure and node attributes within graphs. However, these methods did not prioritize the target lncRNA and disease nodes over auxiliary nodes like miRNA nodes, potentially limiting their ability to fully utilize the features of the target nodes. We propose a new method, mask-guided target node feature learning and dynamic detailed feature enhancement for lncRNA-disease association prediction (MDLD), to enhance node feature learning for improved lncRNA-disease association prediction. First, we designed a heterogeneous graph masked transformer autoencoder to guide feature learning, focusing more on the features of target lncRNA (disease) nodes. The target nodes were increasingly masked as training progressed, which helps develop a more robust prediction model. Second, we developed a graph convolutional network with dynamic residuals (GCNDR) to learn and integrate the heterogeneous topology and features of all lncRNA, disease, and miRNA nodes. GCNDR employs an interlayer residual strategy and a residual evolution strategy to mitigate oversmoothing caused by multilayer graph convolution. The interlayer residual strategy estimates the importance of node features learned in the previous GCN encoding layer for nodes in the current encoding layer. Additionally, since there are dependencies in the importance of features of individual lncRNA (disease, miRNA) nodes across multiple encoding layers, a gated recurrent unit-based strategy is proposed to encode these dependencies. Finally, we designed a perspective-level attention mechanism to obtain more informative features of lncRNA and disease node pairs from the perspectives of mask-enhanced and dynamic-enhanced node features. Cross-validation experimental results demonstrated that MDLD outperformed 10 other state-of-the-art prediction methods. Ablation experiments and case studies on candidate lncRNAs for three diseases further proved the technical contributions of MDLD and its capability to discover disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Wei Wang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
5
|
Gou F, Liu J, Xiao C, Wu J. Research on Artificial-Intelligence-Assisted Medicine: A Survey on Medical Artificial Intelligence. Diagnostics (Basel) 2024; 14:1472. [PMID: 39061610 PMCID: PMC11275417 DOI: 10.3390/diagnostics14141472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open
Abstract
With the improvement of economic conditions and the increase in living standards, people's attention in regard to health is also continuously increasing. They are beginning to place their hopes on machines, expecting artificial intelligence (AI) to provide a more humanized medical environment and personalized services, thus greatly expanding the supply and bridging the gap between resource supply and demand. With the development of IoT technology, the arrival of the 5G and 6G communication era, and the enhancement of computing capabilities in particular, the development and application of AI-assisted healthcare have been further promoted. Currently, research on and the application of artificial intelligence in the field of medical assistance are continuously deepening and expanding. AI holds immense economic value and has many potential applications in regard to medical institutions, patients, and healthcare professionals. It has the ability to enhance medical efficiency, reduce healthcare costs, improve the quality of healthcare services, and provide a more intelligent and humanized service experience for healthcare professionals and patients. This study elaborates on AI development history and development timelines in the medical field, types of AI technologies in healthcare informatics, the application of AI in the medical field, and opportunities and challenges of AI in the field of medicine. The combination of healthcare and artificial intelligence has a profound impact on human life, improving human health levels and quality of life and changing human lifestyles.
Collapse
Affiliation(s)
- Fangfang Gou
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Jun Liu
- The Second People's Hospital of Huaihua, Huaihua 418000, China
| | - Chunwen Xiao
- The Second People's Hospital of Huaihua, Huaihua 418000, China
| | - Jia Wu
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- Research Center for Artificial Intelligence, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
6
|
Qin C, Zhang J, Ma L. EMCMDA: predicting miRNA-disease associations via efficient matrix completion. Sci Rep 2024; 14:12761. [PMID: 38834687 DOI: 10.1038/s41598-024-63582-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 05/30/2024] [Indexed: 06/06/2024] Open
Abstract
Abundant researches have consistently illustrated the crucial role of microRNAs (miRNAs) in a wide array of essential biological processes. Furthermore, miRNAs have been validated as promising therapeutic targets for addressing complex diseases. Given the costly and time-consuming nature of traditional biological experimental validation methods, it is imperative to develop computational methods. In the work, we developed a novel approach named efficient matrix completion (EMCMDA) for predicting miRNA-disease associations. First, we calculated the similarities across multiple sources for miRNA/disease pairs and combined this information to create a holistic miRNA/disease similarity measure. Second, we utilized this biological information to create a heterogeneous network and established a target matrix derived from this network. Lastly, we framed the miRNA-disease association prediction issue as a low-rank matrix-complete issue that was addressed via minimizing matrix truncated schatten p-norm. Notably, we improved the conventional singular value contraction algorithm through using a weighted singular value contraction technique. This technique dynamically adjusts the degree of contraction based on the significance of each singular value, ensuring that the physical meaning of these singular values is fully considered. We evaluated the performance of EMCMDA by applying two distinct cross-validation experiments on two diverse databases, and the outcomes were statistically significant. In addition, we executed comprehensive case studies on two prevalent human diseases, namely lung cancer and breast cancer. Following prediction and multiple validations, it was evident that EMCMDA proficiently forecasts previously undisclosed disease-related miRNAs. These results underscore the robustness and efficacy of EMCMDA in miRNA-disease association prediction.
Collapse
Affiliation(s)
- Chao Qin
- School of Information Science and Engineering, Qilu Normal University, Jinan, 250200, China.
| | - Jiancheng Zhang
- School of Information Science and Engineering, Qilu Normal University, Jinan, 250200, China
| | - Lingyu Ma
- School of Control Science and Engineering, Harbin Institute of Technology, Weihai, 250200, China
| |
Collapse
|
7
|
Ouyang D, Liang Y, Wang J, Li L, Ai N, Feng J, Lu S, Liao S, Liu X, Xie S. HGCLAMIR: Hypergraph contrastive learning with attention mechanism and integrated multi-view representation for predicting miRNA-disease associations. PLoS Comput Biol 2024; 20:e1011927. [PMID: 38652712 PMCID: PMC11037542 DOI: 10.1371/journal.pcbi.1011927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/19/2024] [Indexed: 04/25/2024] Open
Abstract
Existing studies have shown that the abnormal expression of microRNAs (miRNAs) usually leads to the occurrence and development of human diseases. Identifying disease-related miRNAs contributes to studying the pathogenesis of diseases at the molecular level. As traditional biological experiments are time-consuming and expensive, computational methods have been used as an effective complement to infer the potential associations between miRNAs and diseases. However, most of the existing computational methods still face three main challenges: (i) learning of high-order relations; (ii) insufficient representation learning ability; (iii) importance learning and integration of multi-view embedding representation. To this end, we developed a HyperGraph Contrastive Learning with view-aware Attention Mechanism and Integrated multi-view Representation (HGCLAMIR) model to discover potential miRNA-disease associations. First, hypergraph convolutional network (HGCN) was utilized to capture high-order complex relations from hypergraphs related to miRNAs and diseases. Then, we combined HGCN with contrastive learning to improve and enhance the embedded representation learning ability of HGCN. Moreover, we introduced view-aware attention mechanism to adaptively weight the embedded representations of different views, thereby obtaining the importance of multi-view latent representations. Next, we innovatively proposed integrated representation learning to integrate the embedded representation information of multiple views for obtaining more reasonable embedding information. Finally, the integrated representation information was fed into a neural network-based matrix completion method to perform miRNA-disease association prediction. Experimental results on the cross-validation set and independent test set indicated that HGCLAMIR can achieve better prediction performance than other baseline models. Furthermore, the results of case studies and enrichment analysis further demonstrated the accuracy of HGCLAMIR and unconfirmed potential associations had biological significance.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, China
- School of Biomedical Engineering, Guangdong Medical University, Dongguan, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, China
- Pazhou Laboratory (Huangpu), Guangzhou, China
| | - Jinfeng Wang
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Junning Feng
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Shanghui Lu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Shuilin Liao
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China
| |
Collapse
|
8
|
Wang Y, Gao YL, Wang J, Li F, Liu JX. MSGCA: Drug-Disease Associations Prediction Based on Multi-Similarities Graph Convolutional Autoencoder. IEEE J Biomed Health Inform 2023; 27:3686-3694. [PMID: 37163398 DOI: 10.1109/jbhi.2023.3272154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.
Collapse
|
9
|
Hu X, Yin Z, Zeng Z, Peng Y. Prediction of miRNA-Disease Associations by Cascade Forest Model Based on Stacked Autoencoder. Molecules 2023; 28:5013. [PMID: 37446675 DOI: 10.3390/molecules28135013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/23/2023] [Accepted: 06/24/2023] [Indexed: 07/15/2023] Open
Abstract
Numerous pieces of evidence have indicated that microRNA (miRNA) plays a crucial role in a series of significant biological processes and is closely related to complex disease. However, the traditional biological experimental methods used to verify disease-related miRNAs are inefficient and expensive. Thus, it is necessary to design some excellent approaches to improve efficiency. In this work, a novel method (CFSAEMDA) is proposed for the prediction of unknown miRNA-disease associations (MDAs). Specifically, we first capture the interactive features of miRNA and disease by integrating multi-source information. Then, the stacked autoencoder is applied for obtaining the underlying feature representation. Finally, the modified cascade forest model is employed to complete the final prediction. The experimental results present that the AUC value obtained by our method is 97.67%. The performance of CFSAEMDA is superior to several of the latest methods. In addition, case studies conducted on lung neoplasms, breast neoplasms and hepatocellular carcinoma further show that the CFSAEMDA method may be regarded as a utility approach to infer unknown disease-miRNA relationships.
Collapse
Affiliation(s)
- Xiang Hu
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Zhixiang Yin
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Zhiliang Zeng
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Yu Peng
- Center of Intelligent Computing and Applied Statistics, School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
| |
Collapse
|
10
|
Hou J, Wei H, Liu B. iPiDA-SWGCN: Identification of piRNA-disease associations based on Supplementarily Weighted Graph Convolutional Network. PLoS Comput Biol 2023; 19:e1011242. [PMID: 37339125 DOI: 10.1371/journal.pcbi.1011242] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open
Abstract
Accurately identifying potential piRNA-disease associations is of great importance in uncovering the pathogenesis of diseases. Recently, several machine-learning-based methods have been proposed for piRNA-disease association detection. However, they are suffering from the high sparsity of piRNA-disease association network and the Boolean representation of piRNA-disease associations ignoring the confidence coefficients. In this study, we propose a supplementarily weighted strategy to solve these disadvantages. Combined with Graph Convolutional Networks (GCNs), a novel predictor called iPiDA-SWGCN is proposed for piRNA-disease association prediction. There are three main contributions of iPiDA-SWGCN: (i) Potential piRNA-disease associations are preliminarily supplemented in the sparse piRNA-disease network by integrating various basic predictors to enrich network structure information. (ii) The original Boolean piRNA-disease associations are assigned with different relevance confidence to learn node representations from neighbour nodes in varying degrees. (iii) The experimental results show that iPiDA-SWGCN achieves the best performance compared with the other state-of-the-art methods, and can predict new piRNA-disease associations.
Collapse
Affiliation(s)
- Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Shen Y, Liu JX, Yin MM, Zheng CH, Gao YL. BMPMDA: Prediction of MiRNA-Disease Associations Using a Space Projection Model Based on Block Matrix. Interdiscip Sci 2023; 15:88-99. [PMID: 36335274 DOI: 10.1007/s12539-022-00542-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/07/2022]
Abstract
With the high-quality development of bioinformatics technology, miRNA-disease associations (MDAs) are gradually being uncovered. At present, convenient and efficient prediction methods, which solve the problem of resource-consuming in traditional wet experiments, need to be further put forward. In this study, a space projection model based on block matrix is presented for predicting MDAs (BMPMDA). Specifically, two block matrices are first composed of the known association matrix and similarity to increase comprehensiveness. For the integrity of information in the heterogeneous network, matrix completion (MC) is utilized to mine potential MDAs. Considering the neighborhood information of data points, linear neighborhood similarity (LNS) is regarded as a measure of similarity. Next, LNS is projected onto the corresponding completed association matrix to derive the projection score. Finally, the AUC and AUPR values for BMPMDA reach 0.9691 and 0.6231, respectively. Additionally, the majority of novel MDAs in three disease cases are identified in existing databases and literature. It suggests that BMPMDA can serve as a reliable prediction model for biological research.
Collapse
Affiliation(s)
- Yi Shen
- Qufu Normal University, Rizhao, 276800, China
| | | | | | - Chun-Hou Zheng
- Co-Innovation Center for Information Supply and Assurance Technology, Anhui University, Hefei, 230000, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, 276800, China.
| |
Collapse
|
12
|
Ha J, Park S. NCMD: Node2vec-Based Neural Collaborative Filtering for Predicting MiRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1257-1268. [PMID: 35849666 DOI: 10.1109/tcbb.2022.3191972] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Numerous studies have reported that micro RNAs (miRNAs) play pivotal roles in disease pathogenesis based on the deregulation of the expressions of target messenger RNAs. Therefore, the identification of disease-related miRNAs is of great significance in understanding human complex diseases, which can also provide insight into the design of novel prognostic markers and disease therapies. Considering the time and cost involved in wet experiments, most recent works have focused on the effective and feasible modeling of computational frameworks to uncover miRNA-disease associations. In this study, we propose a novel framework called node2vec-based neural collaborative filtering for predicting miRNA-disease association (NCMD) based on deep neural networks. Initially, NCMD exploits Node2vec to learn low-dimensional vector representations of miRNAs and diseases. Next, it utilizes a deep learning framework that combines the linear ability of generalized matrix factorization and nonlinear ability of a multilayer perceptron. Experimental results clearly demonstrate the comparable performance of NCMD relative to the state-of-the-art methods according to statistical measures. In addition, case studies on breast cancer, lung cancer and pancreatic cancer validate the effectiveness of NCMD. Extensive experiments demonstrate the benefits of modeling a neural collaborative-filtering-based approach for discovering novel miRNA-disease associations.
Collapse
|
13
|
Anish TP, Joe Prathap PM. An efficient and low complex model for optimal RBM features with weighted score-based ensemble multi-disease prediction. Comput Methods Biomech Biomed Engin 2023; 26:350-372. [PMID: 36218238 DOI: 10.1080/10255842.2022.2129969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Multi-disease prediction is regarded as the capacity to simultaneously identify various diseases that are expected to be affected an individual at a certain period. These multiple diseases are seemed to be at various progression levels and need to be detected in the patient at the time of clinical visits. Diverse studies in the literature have included the predictive models for particular diseases yet, it is unable to notice humans with multiple diseases since humans are mostly suffered not only from a single disease but also from multiple diseases. Hence, this article aims to implement a novel multi-disease prediction model using an ensemble learning approach with deep features. The required data for the multi-disease prediction is collected from the standard datasets. Then, the collected data are given into the "Deep Belief Network (DBN)" approach, where the features are obtained from the RBM layers. These RBM features are tuned with the help of Deviation-based Hybrid Grasshopper Barnacles Mating Optimization (D-HGBMO) for improving the prediction performance. The optimized RBM features are considered in the ensemble learning model named Ensemble, in which the multi-disease prediction is performed with "Deep Neural Network (DNN), Extreme Learning Machine (ELM), and Long Short Term Memory." The predicted score from three classifiers is used in the optimized weighted score and thresholding-based final prediction using the same D-HGBMO for determining the accurate multi-disease prediction results. The experimental results show the effective performance of the proposed model by comparing it with the existing classifiers with the help of different quantitative measures.
Collapse
Affiliation(s)
- T P Anish
- Assistant Professor, Department of Computer Science and Engineering, R.M.K. College of Engineering and Technology, Puduvoyal, India
| | - P M Joe Prathap
- Professor, Department of Computer Science and Engineering, R.M.D. Engineering College, Kavaraipettai, India
| |
Collapse
|
14
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
15
|
Wang W, Chen H. Predicting miRNA-disease associations based on lncRNA-miRNA interactions and graph convolution networks. Brief Bioinform 2023; 24:6918743. [PMID: 36526276 DOI: 10.1093/bib/bbac495] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 12/23/2022] Open
Abstract
Increasing studies have proved that microRNAs (miRNAs) are critical biomarkers in the development of human complex diseases. Identifying disease-related miRNAs is beneficial to disease prevention, diagnosis and remedy. Based on the assumption that similar miRNAs tend to associate with similar diseases, various computational methods have been developed to predict novel miRNA-disease associations (MDAs). However, selecting proper features for similarity calculation is a challenging task because of data deficiencies in biomedical science. In this study, we propose a deep learning-based computational method named MAGCN to predict potential MDAs without using any similarity measurements. Our method predicts novel MDAs based on known lncRNA-miRNA interactions via graph convolution networks with multichannel attention mechanism and convolutional neural network combiner. Extensive experiments show that the average area under the receiver operating characteristic values obtained by our method under 2-fold, 5-fold and 10-fold cross-validations are 0.8994, 0.9032 and 0.9044, respectively. When compared with five state-of-the-art methods, MAGCN shows improvement in terms of prediction accuracy. In addition, we conduct case studies on three diseases to discover their related miRNAs, and find that all the top 50 predictions for all the three diseases have been supported by established databases. The comprehensive results demonstrate that our method is a reliable tool in detecting new disease-related miRNAs.
Collapse
|
16
|
Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022; 13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open
Abstract
There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pengfeng Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
17
|
Ouyang D, Liang Y, Wang J, Liu X, Xie S, Miao R, Ai N, Li L, Dang Q. Predicting multiple types of miRNA-disease associations using adaptive weighted nonnegative tensor factorization with self-paced learning and hypergraph regularization. Brief Bioinform 2022; 23:6720405. [PMID: 36168938 DOI: 10.1093/bib/bbac390] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
More and more evidence indicates that the dysregulations of microRNAs (miRNAs) lead to diseases through various kinds of underlying mechanisms. Identifying the multiple types of disease-related miRNAs plays an important role in studying the molecular mechanism of miRNAs in diseases. Moreover, compared with traditional biological experiments, computational models are time-saving and cost-minimized. However, most tensor-based computational models still face three main challenges: (i) easy to fall into bad local minima; (ii) preservation of high-order relations; (iii) false-negative samples. To this end, we propose a novel tensor completion framework integrating self-paced learning, hypergraph regularization and adaptive weight tensor into nonnegative tensor factorization, called SPLDHyperAWNTF, for the discovery of potential multiple types of miRNA-disease associations. We first combine self-paced learning with nonnegative tensor factorization to effectively alleviate the model from falling into bad local minima. Then, hypergraphs for miRNAs and diseases are constructed, and hypergraph regularization is used to preserve the high-order complex relations of these hypergraphs. Finally, we innovatively introduce adaptive weight tensor, which can effectively alleviate the impact of false-negative samples on the prediction performance. The average results of 5-fold and 10-fold cross-validation on four datasets show that SPLDHyperAWNTF can achieve better prediction performance than baseline models in terms of Top-1 precision, Top-1 recall and Top-1 F1. Furthermore, we implement case studies to further evaluate the accuracy of SPLDHyperAWNTF. As a result, 98 (MDAv2.0) and 98 (MDAv2.0-2) of top-100 are confirmed by HMDDv3.2 dataset. Moreover, the results of enrichment analysis illustrate that unconfirmed potential associations have biological significance.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen 518055, China.,School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Jianjun Wang
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai 519090, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510000, China
| | - Rui Miao
- Basic Teaching Department, ZhuHai Campus of ZunYi Medical University, Zhuhai 519090, China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| | - Qi Dang
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China
| |
Collapse
|
18
|
Li W, Wang S, Xu J, Xiang J. Inferring Latent MicroRNA-Disease Associations on a Gene-Mediated Tripartite Heterogeneous Multiplexing Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3190-3201. [PMID: 35041612 DOI: 10.1109/tcbb.2022.3143770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules encoded by endogenous genes with a length of about 22 nucleotides. MiRNAs have been successfully identified as differentially expressed in various cancers. There is evidence that disorders of miRNAs are associated with a variety of complex diseases. Therefore, inferring potential miRNA-disease associations (MDAs) is very important for understanding the aetiology and pathogenesis of many diseases and is useful to disease diagnosis, prognosis and treatment. First, We creatively fused multiple similarity subnetworks from multi-sources for miRNAs, genes and diseases by multiplexing technology, respectively. Then, three multiplexed biological subnetworks are connected through the extended binary association to form a tripartite complete heterogeneous multiplexed network (Tri-HM). Finally, because the constructed Tri-HM network can retain subnetworks' original topology and biological functions and expands the binary association and dependence between the three biological entities, rich neighbourhood information is obtained iteratively from neighbours by a non-equilibrium random walk. Through cross-validation, our tri-HM-RWR model obtained an AUC value of 0.8657, and an AUPR value of 0.2139 in the global 5-fold cross-validation, which shows that our model can more fully speculate disease-related miRNAs.
Collapse
|
19
|
Hierarchical graph representation learning for the prediction of drug-target binding affinity. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.09.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
20
|
Li M, Fan Y, Zhang Y, Lv Z. Using Sequence Similarity Based on CKSNP Features and a Graph Neural Network Model to Identify miRNA-Disease Associations. Genes (Basel) 2022; 13:1759. [PMID: 36292644 PMCID: PMC9602123 DOI: 10.3390/genes13101759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 01/12/2024] Open
Abstract
Among many machine learning models for analyzing the relationship between miRNAs and diseases, the prediction results are optimized by establishing different machine learning models, and less attention is paid to the feature information contained in the miRNA sequence itself. This study focused on the impact of the different feature information of miRNA sequences on the relationship between miRNA and disease. It was found that when the graph neural network used was the same and the miRNA features based on the K-spacer nucleic acid pair composition (CKSNAP) feature were adopted, a better graph neural network prediction model of miRNA-disease relationship could be built (AUC = 93.71%), which was 0.15% greater than the best model in the literature based on the same benchmark dataset. The optimized model was also used to predict miRNAs related to lung tumors, esophageal tumors, and kidney tumors, and 47, 47, and 37 of the top 50 miRNAs related to three diseases predicted separately by the model were consistent with descriptions in the wet experiment validation database (dbDEMC).
Collapse
Affiliation(s)
- Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yu Fan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yiting Zhang
- College of Biology, Southwest Jiaotong University, Chengdu 611756, China
- College of Biology, Georgia State University, Atlanta, GA 30302-3965, USA
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
21
|
Asim MN, Ibrahim MA, Zehe C, Trygg J, Dengel A, Ahmed S. BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction. Interdiscip Sci 2022; 14:841-862. [PMID: 35947255 PMCID: PMC9581873 DOI: 10.1007/s12539-022-00535-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 06/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
Abstract
Background and objective: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. Method The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. Results BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. Conclusion In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. Availability: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. Graphic Abstract ![]()
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Christoph Zehe
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
| | - Johan Trygg
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| |
Collapse
|
22
|
MLapSVM-LBS: Predicting DNA-binding proteins via a multiple Laplacian regularized support vector machine with local behavior similarity. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
23
|
Huang D, An J, Zhang L, Liu B. Computational method using heterogeneous graph convolutional network model combined with reinforcement layer for MiRNA-disease association prediction. BMC Bioinformatics 2022; 23:299. [PMID: 35879658 PMCID: PMC9316361 DOI: 10.1186/s12859-022-04843-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A large number of evidences from biological experiments have confirmed that miRNAs play an important role in the progression and development of various human complex diseases. However, the traditional experiment methods are expensive and time-consuming. Therefore, it is a challenging task that how to develop more accurate and efficient methods for predicting potential associations between miRNA and disease. RESULTS In the study, we developed a computational model that combined heterogeneous graph convolutional network with enhanced layer for miRNA-disease association prediction (HGCNELMDA). The major improvement of our method lies in through restarting the random walk optimized the original features of nodes and adding a reinforcement layer to the hidden layer of graph convolutional network retained similar information between nodes in the feature space. In addition, the proposed approach recalculated the influence of neighborhood nodes on target nodes by introducing the attention mechanism. The reliable performance of the HGCNELMDA was certified by the AUC of 93.47% in global leave-one-out cross-validation (LOOCV), and the average AUCs of 93.01% in fivefold cross-validation. Meanwhile, we compared the HGCNELMDA with the state‑of‑the‑art methods. Comparative results indicated that o the HGCNELMDA is very promising and may provide a cost‑effective alternative for miRNA-disease association prediction. Moreover, we applied HGCNELMDA to 3 different case studies to predict potential miRNAs related to lung cancer, prostate cancer, and pancreatic cancer. Results showed that 48, 50, and 50 of the top 50 predicted miRNAs were supported by experimental association evidence. Therefore, the HGCNELMDA is a reliable method for predicting disease-related miRNAs. CONCLUSIONS The results of the HGCNELMDA method in the LOOCV (leave-one-out cross validation, LOOCV) and 5-cross validations were 93.47% and 93.01%, respectively. Compared with other typical methods, the performance of HGCNELMDA is higher. Three cases of lung cancer, prostate cancer, and pancreatic cancer were studied. Among the predicted top 50 candidate miRNAs, 48, 50, and 50 were verified in the biological database HDMMV2.0. Therefore; this further confirms the feasibility and effectiveness of our method. Therefore, this further confirms the feasibility and effectiveness of our method. To facilitate extensive studies for future disease-related miRNAs research, we developed a freely available web server called HGCNELMDA is available at http://124.221.62.44:8080/HGCNELMDA.jsp .
Collapse
Affiliation(s)
- Dan Huang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| | - JiYong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - BaiLong Liu
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| |
Collapse
|
24
|
Wang W, Chen H. Predicting miRNA-disease associations based on graph attention networks and dual Laplacian regularized least squares. Brief Bioinform 2022; 23:6645486. [PMID: 35849099 DOI: 10.1093/bib/bbac292] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/23/2022] [Accepted: 06/26/2022] [Indexed: 01/05/2023] Open
Abstract
Increasing biomedical evidence has proved that the dysregulation of miRNAs is associated with human complex diseases. Identification of disease-related miRNAs is of great importance for disease prevention, diagnosis and remedy. To reduce the time and cost of biomedical experiments, there is a strong incentive to develop efficient computational methods to infer potential miRNA-disease associations. Although many computational approaches have been proposed to address this issue, the prediction accuracy needs to be further improved. In this study, we present a computational framework MKGAT to predict possible associations between miRNAs and diseases through graph attention networks (GATs) using dual Laplacian regularized least squares. We use GATs to learn embeddings of miRNAs and diseases on each layer from initial input features of known miRNA-disease associations, intra-miRNA similarities and intra-disease similarities. We then calculate kernel matrices of miRNAs and diseases based on Gaussian interaction profile (GIP) with the learned embeddings. We further fuse the kernel matrices of each layer and initial similarities with attention mechanism. Dual Laplacian regularized least squares are finally applied for new miRNA-disease association predictions with the fused miRNA and disease kernels. Compared with six state-of-the-art methods by 5-fold cross-validations, our method MKGAT receives the highest AUROC value of 0.9627 and AUPR value of 0.7372. We use MKGAT to predict related miRNAs for three cancers and discover that all the top 50 predicted results in the three diseases are confirmed by existing databases. The excellent performance indicates that MKGAT would be a useful computational tool for revealing disease-related miRNAs.
Collapse
Affiliation(s)
- Wengang Wang
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Hailin Chen
- School of Software, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
25
|
Ai C, Yang H, Ding Y, Tang J, Guo F. A multi-layer multi-kernel neural network for determining associations between non-coding RNAs and diseases. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
26
|
Cai L, Gao M, Ren X, Fu X, Xu J, Wang P, Chen Y. MILNP: Plant lncRNA-miRNA Interaction Prediction Based on Improved Linear Neighborhood Similarity and Label Propagation. FRONTIERS IN PLANT SCIENCE 2022; 13:861886. [PMID: 35401586 PMCID: PMC8990282 DOI: 10.3389/fpls.2022.861886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Knowledge of the interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) is the basis of understanding various biological activities and designing new drugs. Previous computational methods for predicting lncRNA-miRNA interactions lacked for plants, and they suffer from various limitations that affect the prediction accuracy and their applicability. Research on plant lncRNA-miRNA interactions is still in its infancy. In this paper, we propose an accurate predictor, MILNP, for predicting plant lncRNA-miRNA interactions based on improved linear neighborhood similarity measurement and linear neighborhood propagation algorithm. Specifically, we propose a novel similarity measure based on linear neighborhood similarity from multiple similarity profiles of lncRNAs and miRNAs and derive more precise neighborhood ranges so as to escape the limits of the existing methods. We then simultaneously update the lncRNA-miRNA interactions predicted from both similarity matrices based on label propagation. We comprehensively evaluate MILNP on the latest plant lncRNA-miRNA interaction benchmark datasets. The results demonstrate the superior performance of MILNP than the most up-to-date methods. What's more, MILNP can be leveraged for isolated plant lncRNAs (or miRNAs). Case studies suggest that MILNP can identify novel plant lncRNA-miRNA interactions, which are confirmed by classical tools. The implementation is available on https://github.com/HerSwain/gra/tree/MILNP.
Collapse
Affiliation(s)
| | | | | | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Peng Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
27
|
Chen XG, Liu S, Zhang W. Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1075-1083. [PMID: 32886613 DOI: 10.1109/tcbb.2020.3021800] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-coding RNAs (ncRNAs)play an important role in various biological processes and are associated with diseases. Distinguishing between coding RNAs and ncRNAs, also known as predicting coding potential of RNA sequences, is critical for downstream biological function analysis. Many machine learning-based methods have been proposed for predicting coding potential of RNA sequences. Recent studies reveal that most existing methods have poor performance on RNA sequences with short Open Reading Frames (sORF, ORF length<303nt). In this work, we analyze the distribution of ORF length of RNA sequences, and observe that the number of coding RNAs with sORF is inadequate and coding RNAs with sORF are much less than ncRNAs with sORF. Thus, there exists the problem of local data imbalance in RNA sequences with sORF. We propose a coding potential prediction method CPE-SLDI, which uses data oversampling techniques to augment samples for coding RNAs with sORF so as to alleviate local data imbalance. Compared with existing methods, CPE-SLDI produces the better performances, and studies reveal that data augmentation by various data oversampling techniques can enhance the performance of coding potential prediction, especially for RNA sequences with sORF. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPESLDI.
Collapse
|
28
|
Li Z, Zhong T, Huang D, You ZH, Nie R. Hierarchical graph attention network for miRNA-disease association prediction. Mol Ther 2022; 30:1775-1786. [PMID: 35121109 PMCID: PMC9077381 DOI: 10.1016/j.ymthe.2022.01.041] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/29/2021] [Accepted: 01/28/2022] [Indexed: 11/25/2022] Open
Abstract
Many biological studies show that the mutation and abnormal expression of microRNAs (miRNAs) could cause a variety of diseases. As an important biomarker for disease diagnosis, miRNA is helpful to understand pathogenesis, and could promote the identification, diagnosis and treatment of diseases. However, the pathogenic mechanism how miRNAs affect these diseases has not been fully understood. Therefore, predicting the potential miRNA-disease associations is of great importance for the development of clinical medicine and drug research. In this study, we proposed a novel deep learning model based on hierarchical graph attention network for predicting miRNA-disease associations (HGANMDA). Firstly, we constructed a miRNA-disease-lncRNA heterogeneous graph based on known miRNA-disease associations, miRNA-lncRNA associations and disease-lncRNA associations. Secondly, the node-layer attention was applied to learn the importance of neighbor nodes based on different meta-paths. Thirdly, the semantic-layer attention was applied to learn the importance of different meta-paths. Finally, a bilinear decoder was employed to reconstruct the connections between miRNAs and diseases. The extensive experimental results indicated that our model achieved good performance and satisfactory results in predicting miRNA-disease associations.
Collapse
|
29
|
Niu Y, Song C, Gong Y, Zhang W. MiRNA-Drug Resistance Association Prediction Through the Attentive Multimodal Graph Convolutional Network. Front Pharmacol 2022; 12:799108. [PMID: 35095506 PMCID: PMC8790023 DOI: 10.3389/fphar.2021.799108] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
MiRNAs can regulate genes encoding specific proteins which are related to the efficacy of drugs, and predicting miRNA-drug resistance associations is of great importance. In this work, we propose an attentive multimodal graph convolution network method (AMMGC) to predict miRNA-drug resistance associations. AMMGC learns the latent representations of drugs and miRNAs from four graph convolution sub-networks with distinctive combinations of features. Then, an attention neural network is employed to obtain attentive representations of drugs and miRNAs, and miRNA-drug resistance associations are predicted by the inner product of learned attentive representations. The computational experiments show that AMMGC outperforms other state-of-the-art methods and baseline methods, achieving the AUPR score of 0.2399 and the AUC score of 0.9467. The analysis demonstrates that leveraging multiple features of drugs and miRNAs can make a contribution to the miRNA-drug resistance association prediction. The usefulness of AMMGC is further validated by case studies.
Collapse
Affiliation(s)
- Yanqing Niu
- School of Mathematics and Statistics, South-Central University for Nationalities, Wuhan, China
| | - Congzhi Song
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yuchong Gong
- School of Computer Science, Wuhan University, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
30
|
Fu H, Huang F, Liu X, Qiu Y, Zhang W. MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics 2022; 38:426-434. [PMID: 34499148 DOI: 10.1093/bioinformatics/btab651] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/07/2021] [Accepted: 09/06/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION There are various interaction/association bipartite networks in biomolecular systems. Identifying unobserved links in biomedical bipartite networks helps to understand the underlying molecular mechanisms of human complex diseases and thus benefits the diagnosis and treatment of diseases. Although a great number of computational methods have been proposed to predict links in biomedical bipartite networks, most of them heavily depend on features and structures involving the bioentities in one specific bipartite network, which limits the generalization capacity of applying the models to other bipartite networks. Meanwhile, bioentities usually have multiple features, and how to leverage them has also been challenging. RESULTS In this study, we propose a novel multi-view graph convolution network (MVGCN) framework for link prediction in biomedical bipartite networks. We first construct a multi-view heterogeneous network (MVHN) by combining the similarity networks with the biomedical bipartite network, and then perform a self-supervised learning strategy on the bipartite network to obtain node attributes as initial embeddings. Further, a neighborhood information aggregation (NIA) layer is designed for iteratively updating the embeddings of nodes by aggregating information from inter- and intra-domain neighbors in every view of the MVHN. Next, we combine embeddings of multiple NIA layers in each view, and integrate multiple views to obtain the final node embeddings, which are then fed into a discriminator to predict the existence of links. Extensive experiments show MVGCN performs better than or on par with baseline methods and has the generalization capacity on six benchmark datasets involving three typical tasks. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/fuhaitao95/MVGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
31
|
Zhang G, Li M, Deng H, Xu X, Liu X, Zhang W. SGNNMD: signed graph neural network for predicting deregulation types of miRNA-disease associations. Brief Bioinform 2021; 23:6455665. [PMID: 34875683 DOI: 10.1093/bib/bbab464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
MiRNAs are a class of small non-coding RNA molecules that play an important role in many biological processes, and determining miRNA-disease associations can benefit drug development and clinical diagnosis. Although great efforts have been made to develop miRNA-disease association prediction methods, few attention has been paid to in-depth classification of miRNA-disease associations, e.g. up/down-regulation of miRNAs in diseases. In this paper, we regard known miRNA-disease associations as a signed bipartite network, which has miRNA nodes, disease nodes and two types of edges representing up/down-regulation of miRNAs in diseases, and propose a signed graph neural network method (SGNNMD) for predicting deregulation types of miRNA-disease associations. SGNNMD extracts subgraphs around miRNA-disease pairs from the signed bipartite network and learns structural features of subgraphs via a labeling algorithm and a neural network, and then combines them with biological features (i.e. miRNA-miRNA functional similarity and disease-disease semantic similarity) to build the prediction model. In the computational experiments, SGNNMD achieves highly competitive performance when compared with several baselines, including the signed graph link prediction methods, multi-relation prediction methods and one existing deregulation type prediction method. Moreover, SGNNMD has good inductive capability and can generalize to miRNAs/diseases unseen during the training.
Collapse
Affiliation(s)
- Guangzhan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Huan Deng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinran Xu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
32
|
Wang W, Wang Y, Zhang Y, Liu D, Zhang H, Wang X. PPDTS: Predicting potential drug-target interactions based on network similarity. IET Syst Biol 2021; 16:18-27. [PMID: 34783172 PMCID: PMC8849239 DOI: 10.1049/syb2.12037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/06/2021] [Accepted: 11/04/2021] [Indexed: 11/19/2022] Open
Abstract
Identification of drug–target interactions (DTIs) has great practical importance in the drug discovery process for known diseases. However, only a small proportion of DTIs in these databases has been verified experimentally, and the computational methods for predicting the interactions remain challenging. As a result, some effective computational models have become increasingly popular for predicting DTIs. In this work, the authors predict potential DTIs from the local structure of drug–target associations' network, which is different from the traditional global network similarity methods based on structure and ligand. A novel method called PPDTS is proposed to predict DTIs. First, according to the DTIs’ network local structure, the known DTIs are converted into a binary network. Second, the Resource Allocation algorithm is used to obtain a drug–drug similarity network and a target–target similarity network. Third, a Collaborative Filtering algorithm is used with the known drug–target topology information to obtain similarity scores. Fourth, the linear combination of drug–target similarity model and the target–drug similarity model are innovatively proposed to obtain the final prediction results. Finally, the experimental performance of PPDTS has proved to be higher than that of the previously mentioned four popular network‐based similarity methods, which is validated in different experimental datasets. Some of the predicted results can be supported in UniProt and DrugBank databases.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality of Henan Province, Henan Normal University, Xinxiang, China
| | - Yongqing Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Yu Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality of Henan Province, Henan Normal University, Xinxiang, China
| | - Hongjun Zhang
- Computer Science and Technology, Anyang University, Anyang, China
| | - Xianfang Wang
- Computer Science and Technology, Henan Institute of Technology, Xinxiang, China
| |
Collapse
|
33
|
Yang J, He S, Zhang Z, Bo X. NegStacking: Drug-Target Interaction Prediction Based on Ensemble Learning and Logistic Regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2624-2634. [PMID: 31985434 DOI: 10.1109/tcbb.2020.2968025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Drug-target interactions (DTIs) identification is an important issue of drug research, and many methods proposed to predict potential DTIs based on machine learning treat it as a binary classification problem. However, the number of known interacting drug-target pairs (positive samples) is far less than that of non-interacting pairs (negative samples). Most methods do not utilize these large numbers of negative samples sufficiently, which limits their prediction performance. To address this problem, we proposed a stacking framework named NegStacking. First, it uses sampling to obtain multiple completely different negative sample sets. Then, each weak learner is trained with a different negative sample set and the same positive sample set, and the logistic regression (LR) is used as a meta-learner to adaptively combine these weak learners. Moreover, in the training process, feature subspacing and hyperparameter perturbation are applied to increase ensemble diversity. Finally, the trained model could be used to predict new samples. We compared NegStacking with other methods, and the experimental results show that our model is superior. NegStacking can improve the performance of predictive DTIs, and it has broad application prospects for improving the drug discovery process. The source code and datasets are available at https://github.com/Open-ss/NegStacking.
Collapse
|
34
|
Wang YT, Li L, Ji CM, Zheng CH, Ni JC. ILPMDA: Predicting miRNA-Disease Association Based on Improved Label Propagation. Front Genet 2021; 12:743665. [PMID: 34659364 PMCID: PMC8514753 DOI: 10.3389/fgene.2021.743665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/30/2021] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that have been demonstrated to be related to numerous complex human diseases. Considerable studies have suggested that miRNAs affect many complicated bioprocesses. Hence, the investigation of disease-related miRNAs by utilizing computational methods is warranted. In this study, we presented an improved label propagation for miRNA-disease association prediction (ILPMDA) method to observe disease-related miRNAs. First, we utilized similarity kernel fusion to integrate different types of biological information for generating miRNA and disease similarity networks. Second, we applied the weighted k-nearest known neighbor algorithm to update verified miRNA-disease association data. Third, we utilized improved label propagation in disease and miRNA similarity networks to make association prediction. Furthermore, we obtained final prediction scores by adopting an average ensemble method to integrate the two kinds of prediction results. To evaluate the prediction performance of ILPMDA, two types of cross-validation methods and case studies on three significant human diseases were implemented to determine the accuracy and effectiveness of ILPMDA. All results demonstrated that ILPMDA had the ability to discover potential miRNA-disease associations.
Collapse
Affiliation(s)
- Yu-Tian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Lei Li
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Cun-Mei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jian-Cheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| |
Collapse
|
35
|
Li M, Wang Y, Li F, Zhao Y, Liu M, Zhang S, Bin Y, Smith AI, Webb GI, Li J, Song J, Xia J. A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1801-1810. [PMID: 32813660 PMCID: PMC8703204 DOI: 10.1109/tcbb.2020.3017386] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.
Collapse
|
36
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
37
|
Li W, Wang S, Xu J. An Ensemble Matrix Completion Model for Predicting Potential Drugs Against SARS-CoV-2. Front Microbiol 2021; 12:694534. [PMID: 34367094 PMCID: PMC8334363 DOI: 10.3389/fmicb.2021.694534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Because of the catastrophic outbreak of global coronavirus disease 2019 (COVID-19) and its strong infectivity and possible persistence, computational repurposing of existing approved drugs will be a promising strategy that facilitates rapid clinical treatment decisions and provides reasonable justification for subsequent clinical trials and regulatory reviews. Since the effects of a small number of conditionally marketed vaccines need further clinical observation, there is still an urgent need to quickly and effectively repurpose potentially available drugs before the next disease peak. In this work, we have manually collected a set of experimentally confirmed virus-drug associations through the publicly published database and literature, consisting of 175 drugs and 95 viruses, as well as 933 virus-drug associations. Then, because the samples are extremely sparse and unbalanced, negative samples cannot be easily obtained. We have developed an ensemble model, EMC-Voting, based on matrix completion and weighted soft voting, a semi-supervised machine learning model for computational drug repurposing. Finally, we have evaluated the prediction performance of EMC-Voting by fivefold crossing-validation and compared it with other baseline classifiers and prediction models. The case study for the virus SARS-COV-2 included in the dataset demonstrates that our model achieves the outperforming AUPR value of 0.934 in virus-drug association's prediction.
Collapse
Affiliation(s)
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
38
|
Chen XG, Zhang W, Yang X, Li C, Chen H. ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation. Front Genet 2021; 12:698477. [PMID: 34276801 PMCID: PMC8279753 DOI: 10.3389/fgene.2021.698477] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/07/2021] [Indexed: 12/09/2022] Open
Abstract
Anticancer peptides (ACPs) have provided a promising perspective for cancer treatment, and the prediction of ACPs is very important for the discovery of new cancer treatment drugs. It is time consuming and expensive to use experimental methods to identify ACPs, so computational methods for ACP identification are urgently needed. There have been many effective computational methods, especially machine learning-based methods, proposed for such predictions. Most of the current machine learning methods try to find suitable features or design effective feature learning techniques to accurately represent ACPs. However, the performance of these methods can be further improved for cases with insufficient numbers of samples. In this article, we propose an ACP prediction model called ACP-DA (Data Augmentation), which uses data augmentation for insufficient samples to improve the prediction performance. In our method, to better exploit the information of peptide sequences, peptide sequences are represented by integrating binary profile features and AAindex features, and then the samples in the training set are augmented in the feature space. After data augmentation, the samples are used to train the machine learning model, which is used to predict ACPs. The performance of ACP-DA exceeds that of existing methods, and ACP-DA achieves better performance in the prediction of ACPs compared with a method without data augmentation. The proposed method is available at http://github.com/chenxgscuec/ACPDA.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Chenhong Li
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Hengling Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| |
Collapse
|
39
|
Qian Y, Jiang L, Ding Y, Tang J, Guo F. A sequence-based multiple kernel model for identifying DNA-binding proteins. BMC Bioinformatics 2021; 22:291. [PMID: 34058979 PMCID: PMC8167993 DOI: 10.1186/s12859-020-03875-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/13/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND DNA-Binding Proteins (DBP) plays a pivotal role in biological system. A mounting number of researchers are studying the mechanism and detection methods. To detect DBP, the tradition experimental method is time-consuming and resource-consuming. In recent years, Machine Learning methods have been used to detect DBP. However, it is difficult to adequately describe the information of proteins in predicting DNA-binding proteins. In this study, we extract six features from protein sequence and use Multiple Kernel Learning-based on Centered Kernel Alignment to integrate these features. The integrated feature is fed into Support Vector Machine to build predictive model and detect new DBP. RESULTS In our work, date sets of PDB1075 and PDB186 are employed to test our method. From the results, our model obtains better results (accuracy) than other existing methods on PDB1075 ([Formula: see text]) and PDB186 ([Formula: see text]), respectively. CONCLUSION Multiple kernel learning could fuse the complementary information between different features. Compared with existing methods, our method achieves comparable and best results on benchmark data sets.
Collapse
Affiliation(s)
- Yuqing Qian
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, People's Republic of China
| | - Limin Jiang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, People's Republic of China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, People's Republic of China.
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, People's Republic of China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, People's Republic of China.
| |
Collapse
|
40
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
41
|
Li HY, You ZH, Wang L, Yan X, Li ZW. DF-MDA: An effective diffusion-based computational model for predicting miRNA-disease association. Mol Ther 2021; 29:1501-1511. [PMID: 33429082 DOI: 10.1016/j.ymthe.2021.01.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 12/21/2020] [Accepted: 01/01/2021] [Indexed: 12/28/2022] Open
Abstract
It is reported that microRNAs (miRNAs) play an important role in various human diseases. However, the mechanisms of miRNA in these diseases have not been fully understood. Therefore, detecting potential miRNA-disease associations has far-reaching significance for pathological development and the diagnosis and treatment of complex diseases. In this study, we propose a novel diffusion-based computational method, DF-MDA, for predicting miRNA-disease association based on the assumption that molecules are related to each other in human physiological processes. Specifically, we first construct a heterogeneous network by integrating various known associations among miRNAs, diseases, proteins, long non-coding RNAs (lncRNAs), and drugs. Then, more representative features are extracted through a diffusion-based machine-learning method. Finally, the Random Forest classifier is adopted to classify miRNA-disease associations. In the 5-fold cross-validation experiment, the proposed model obtained the average area under the curve (AUC) of 0.9321 on the HMDD v3.0 dataset. To further verify the prediction performance of the proposed model, DF-MDA was applied in three significant human diseases, including lymphoma, lung neoplasms, and colon neoplasms. As a result, 47, 46, and 47 out of top 50 predictions were validated by independent databases. These experimental results demonstrated that DF-MDA is a reliable and efficient method for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Hao-Yuan Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Lei Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China; School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China.
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
42
|
Wang J, Li J, Yue K, Wang L, Ma Y, Li Q. NMCMDA: neural multicategory MiRNA-disease association prediction. Brief Bioinform 2021; 22:6189772. [PMID: 33778850 DOI: 10.1093/bib/bbab074] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 02/05/2021] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION There is growing evidence showing that the dysregulations of miRNAs cause diseases through various kinds of the underlying mechanism. Thus, predicting the multiple-category associations between microRNAs (miRNAs) and diseases plays an important role in investigating the roles of miRNAs in diseases. Moreover, in contrast with traditional biological experiments which are time-consuming and expensive, computational approaches for the prediction of multicategory miRNA-disease associations are time-saving and cost-effective that are highly desired for us. RESULTS We present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA-disease association prediction (NMCMDA) for predicting multiple-category miRNA-disease associations. The NMCMDA has two main components: (i) encoder operates directly on the miRNA-disease heterogeneous network and leverages Graph Neural Network to learn miRNA and disease latent representations, respectively. (ii) Decoder yields miRNA-disease association scores with the learned latent representations as input. Various kinds of encoders and decoders are proposed for NMCMDA. Finally, the NMCMDA with the encoder of Relational Graph Convolutional Network and the neural multirelational decoder (NMR-RGCN) achieves the best prediction performance. We compared the NMCMDA with other baselines on three experimental datasets. The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall, and Top-1 F1. Additionally, case studies are provided for two high-risk human diseases (namely, breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA-disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.
Collapse
Affiliation(s)
| | - Jin Li
- School of Software, Yunnan University, China
| | - Kun Yue
- School of Information, Yunnan University, China
| | | | | | - Qing Li
- Kunming Medical University, China
| |
Collapse
|
43
|
Guo R, Teng Z, Wang Y, Zhou X, Xu H, Liu D. Integrated Learning: Screening Optimal Biomarkers for Identifying Preeclampsia in Placental mRNA Samples. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6691096. [PMID: 33680070 PMCID: PMC7925050 DOI: 10.1155/2021/6691096] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/17/2021] [Accepted: 01/27/2021] [Indexed: 01/28/2023]
Abstract
Preeclampsia (PE) is a maternal disease that causes maternal and child death. Treatment and preventive measures are not sound enough. The problem of PE screening has attracted much attention. The purpose of this study is to screen placental mRNA to obtain the best PE biomarkers for identifying patients with PE. We use Limma in the R language to screen out the 48 differentially expressed genes with the largest differences and used correlation-based feature selection algorithms to reduce the dimensionality and avoid attribute redundancy arising from too many mRNA samples participating in the classification. After reducing the mRNA attributes, the mRNA samples are sorted from large to small according to information gain. In this study, a classifier model is designed to identify whether samples had PE through mRNA in the placenta. To improve the accuracy of classification and avoid overfitting, three classifiers, including C4.5, AdaBoost, and multilayer perceptron, are used. We use the majority voting strategy integrated with the differentially expressed genes and the genes filtered by the best subset method as comparison methods to train the classifier. The results show that the classification accuracy rate has increased from 79% to 82.2%, and the number of mRNA features has decreased from 48 to 13. This study provides clues for the main PE biomarkers of mRNA in the placenta and provides ideas for the treatment and screening of PE.
Collapse
Affiliation(s)
- Rong Guo
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Zhixia Teng
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Yiding Wang
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Xin Zhou
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| | - Heze Xu
- Department of Gynecology and Obstetrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Dan Liu
- Information and Computer Engineering College, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
44
|
Huang Q, Zhou W, Guo F, Xu L, Zhang L. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning. PeerJ 2021; 9:e10813. [PMID: 33604189 PMCID: PMC7866889 DOI: 10.7717/peerj.10813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/30/2020] [Indexed: 01/03/2023] Open
Abstract
With the accumulation of data on 6mA modification sites, an increasing number of scholars have begun to focus on the identification of 6mA sites. Despite the recognized importance of 6mA sites, methods for their identification remain lacking, with most existing methods being aimed at their identification in individual species. In the present study, we aimed to develop an identification method suitable for multiple species. Based on previous research, we propose a method for 6mA site recognition. Our experiments prove that the proposed 6mA-Pred method is effective for identifying 6mA sites in genes from taxa such as rice, Mus musculus, and human. A series of experimental results show that 6mA-Pred is an excellent method. We provide the source code used in the study, which can be obtained from http://39.100.246.211:5004/6mA_Pred/.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| |
Collapse
|
45
|
Lv Z, Wang P, Zou Q, Jiang Q. Identification of Sub-Golgi protein localization by use of deep representation learning features. Bioinformatics 2020; 36:5600-5609. [PMID: 33367627 PMCID: PMC8023683 DOI: 10.1093/bioinformatics/btaa1074] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/10/2020] [Accepted: 12/14/2020] [Indexed: 12/11/2022] Open
Abstract
Motivation The Golgi apparatus has a key functional role in protein biosynthesis within the eukaryotic cell with malfunction resulting in various neurodegenerative diseases. For a better understanding of the Golgi apparatus, it is essential to identification of sub-Golgi protein localization. Although some machine learning methods have been used to identify sub-Golgi localization proteins by sequence representation fusion, more accurate sub-Golgi protein identification is still challenging by existing methodology. Results we developed a protein sub-Golgi localization identification protocol using deep representation learning features with 107 dimensions. By this protocol, we demonstrated that instead of multi-type protein sequence feature representation fusion as in previous state-of-the-art sub-Golgi-protein localization classifiers, it is sufficient to exploit only one type of feature representation for more accurately identification of sub-Golgi proteins. Compared with independent testing results for benchmark datasets, our protocol is able to perform generally, reliably and robustly for sub-Golgi protein localization prediction. Availabilityand implementation A use-friendly webserver is freely accessible at http://isGP-DRLF.aibiochem.net and the prediction code is accessible at https://github.com/zhibinlv/isGP-DRLF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| |
Collapse
|
46
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
47
|
Abstract
Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.
Collapse
Affiliation(s)
- Yanzhen Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohan Zhao
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shuai Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
48
|
Wang W, Guan X, Khan MT, Xiong Y, Wei DQ. LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions. Comput Biol Chem 2020; 89:107406. [PMID: 33120126 DOI: 10.1016/j.compbiolchem.2020.107406] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 10/12/2020] [Accepted: 10/15/2020] [Indexed: 02/07/2023]
Abstract
The interactions between miRNAs and long non-coding RNAs (lncRNAs) are subject to intensive recent studies due to its critical role in gene regulations. Computational prediction of lncRNA-miRNA interactions has become a popular alternative strategy to the experimental methods for identification of underlying interactions. It is desirable to develop the machine learning-based models for prediction of lncRNA-miRNA based on the experimentally validated interactions between lncRNAs and miRNAs. The accuracy and robustness of existing models based on machine learning techniques are subject to further improvement. Considering that the attributes of lncRNA and miRNA contribute key importance in the interaction between these two RNAs, a deep learning model, named LMI-DForest, is proposed here by combining the deep forest and autoencoder strategies. Systematic comparison on the experiment validated datasets for lncRNA-miRNA interaction datasets demonstrates that the proposed method consistently shows superior performance over the other machine learning models in the lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Wei Wang
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Muhammad Tahir Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore Pakistan, Pakistan
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Peng Cheng Laboratory, Shenzhen, Guangdong, China.
| |
Collapse
|
49
|
Peng L, Shen L, Liao L, Liu G, Zhou L. RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization. Front Microbiol 2020; 11:592430. [PMID: 33193260 PMCID: PMC7652725 DOI: 10.3389/fmicb.2020.592430] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022] Open
Abstract
Microbes with abnormal levels have important impacts on the formation and development of various complex diseases. Identifying possible Microbe-Disease Associations (MDAs) helps to understand the mechanisms of complex diseases. However, experimental methods for MDA identification are costly and time-consuming. In this study, a new computational model, RNMFMDA, was developed to find possible MDAs. RNMFMDA contains two main processes. First, Reliable Negative MDA samples were selected based on Positive-Unlabeled (PU) learning and random walk with restart on the heterogeneous microbe-disease network. Second, Logistic Matrix Factorization with Neighborhood Regularization (LMFNR) was developed to compute the association probabilities for all microbe-disease pairs. To evaluate the performance of the proposed RNMFMDA method, we compared RNMFMDA with five state-of-the-art MDA prediction methods based on five-fold cross-validations on microbes, diseases, and MDAs. As a result, RNMFMDA obtained the best AUCs of 0.6332, 0.8669, and 0.9081, respectively for the three five-fold cross validations, significantly outperforming other models. The promising prediction performance may be attributed to the following three features: highly quality negative MDA sample selection, LMFNR-based MDA prediction model, and various biological information integration. In addition, a few predicted microbe-disease pairs with high association scores are worthy of further experimental validation.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Longjie Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Guangyi Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
50
|
Zhang L, Liu B, Li Z, Zhu X, Liang Z, An J. Predicting MiRNA-disease associations by multiple meta-paths fusion graph embedding model. BMC Bioinformatics 2020; 21:470. [PMID: 33087064 PMCID: PMC7579830 DOI: 10.1186/s12859-020-03765-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 09/17/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Many studies prove that miRNAs have significant roles in diagnosing and treating complex human diseases. However, conventional biological experiments are too costly and time-consuming to identify unconfirmed miRNA-disease associations. Thus, computational models predicting unidentified miRNA-disease pairs in an efficient way are becoming promising research topics. Although existing methods have performed well to reveal unidentified miRNA-disease associations, more work is still needed to improve prediction performance. RESULTS In this work, we present a novel multiple meta-paths fusion graph embedding model to predict unidentified miRNA-disease associations (M2GMDA). Our method takes full advantage of the complex structure and rich semantic information of miRNA-disease interactions in a self-learning way. First, a miRNA-disease heterogeneous network was derived from verified miRNA-disease pairs, miRNA similarity and disease similarity. All meta-path instances connecting miRNAs with diseases were extracted to describe intrinsic information about miRNA-disease interactions. Then, we developed a graph embedding model to predict miRNA-disease associations. The model is composed of linear transformations of miRNAs and diseases, the means encoder of a single meta-path instance, the attention-aware encoder of meta-path type and attention-aware multiple meta-path fusion. We innovatively integrated meta-path instances, meta-path based neighbours, intermediate nodes in meta-paths and more information to strengthen the prediction in our model. In particular, distinct contributions of different meta-path instances and meta-path types were combined with attention mechanisms. The data sets and source code that support the findings of this study are available at https://github.com/dangdangzhang/M2GMDA . CONCLUSIONS M2GMDA achieved AUCs of 0.9323 and 0.9182 in global leave-one-out cross validation and fivefold cross validation with HDMM V2.0. The results showed that our method outperforms other prediction methods. Three kinds of case studies with lung neoplasms, breast neoplasms, prostate neoplasms, pancreatic neoplasms, lymphoma and colorectal neoplasms demonstrated that 47, 50, 49, 48, 50 and 50 out of the top 50 candidate miRNAs predicted by M2GMDA were validated by biological experiments. Therefore, it further confirms the prediction performance of our method.
Collapse
Affiliation(s)
- Lei Zhang
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Bailong Liu
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China.
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.
| | - Zhengwei Li
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China.
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.
| | - Xiaoyan Zhu
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Zhizhen Liang
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Jiyong An
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|