1
|
Yuan Y, Chen S, Hu R, Wang X. MutualDTA: An Interpretable Drug-Target Affinity Prediction Model Leveraging Pretrained Models and Mutual Attention. J Chem Inf Model 2025; 65:1211-1227. [PMID: 39878060 DOI: 10.1021/acs.jcim.4c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Efficient and accurate drug-target affinity (DTA) prediction can significantly accelerate the drug development process. Recently, deep learning models have been widely applied to DTA prediction and have achieved notable success. However, existing methods often encounter several common issues: first, the data representations lack sufficient information; second, the extracted features are not comprehensive; and third, most methods lack interpretability when modeling drug-target binding. To overcome the above-mentioned problems, we propose an interpretable deep learning model called MutualDTA for predicting DTA. MutualDTA leverages the power of pretrained models to obtain accurate representations of drugs and targets. It also employs well-designed modules to extract hidden features from these representations. Furthermore, the interpretability of MutualDTA is realized by the Mutual-Attention module, which (i) establishes relationships between drugs and proteins from the perspective of intermolecular interactions between drug atoms and protein amino acid residues and (ii) allows MutualDTA to capture the binding sites based on attention scores. The test results on two benchmark data sets show that MutualDTA achieves the best performance compared to the 12 state-of-the-art models. Attention visualization experiments show that MutualDTA can capture partial interaction sites, which not only helps drug developers reduce the search space for binding sites, but also demonstrates the interpretability of MutualDTA. Finally, the trained MutualDTA is applied to screen high-affinity drug screens targeting Alzheimer's disease (AD)-related proteins, and the screened drugs are partially present in the anti-AD drug library. These results demonstrate the reliability of MutualDTA in drug development.
Collapse
Affiliation(s)
- Yongna Yuan
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Siming Chen
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Rizhen Hu
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Xin Wang
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| |
Collapse
|
2
|
Liang SZ, Wang L, You ZH, Yu CQ, Wei MM, Wei Y, Shi TL, Jiang C. Predicting circRNA-Disease Associations through Multisource Domain-Aware Embeddings and Feature Projection Networks. J Chem Inf Model 2025; 65:1666-1676. [PMID: 39829001 DOI: 10.1021/acs.jcim.4c02250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Recent studies have highlighted the significant role of circular RNAs (circRNAs) in various diseases. Accurately predicting circRNA-disease associations is crucial for understanding their biological functions and disease mechanisms. This work introduces the MNDCDA method, designed to address the challenges posed by the limited number of known circRNA-disease associations and the high cost of biological experiments. MNDCDA integrates multiple biological data sources with neighborhood-aware embedding models and deep feature projection networks to predict potential pathways linking circRNAs to diseases. Initially, comprehensive biometric data are used to construct four similarity networks, forming a diverse circRNA-disease interaction framework. Next, a neighborhood-aware embedding model captures structural information about circRNAs and diseases, while deep feature projection networks learn high-order feature interactions and nonlinear connections. Finally, a bilinear decoder identifies novel associations between circRNAs and diseases. The MNDCDA model achieved an AUC of 0.9070 on a constructed benchmark dataset. In case studies, 25 out of 30 predicted circRNA-disease pairs were validated through wet lab experiments and published literature. These extensive experimental results demonstrate that MNDCDA is a robust computational tool for predicting circRNA-disease associations, providing valuable insights while helping to reduce research costs.
Collapse
Affiliation(s)
- Si-Zhe Liang
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning 530007, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Meng-Meng Wei
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Yu Wei
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Tai-Long Shi
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Chen Jiang
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| |
Collapse
|
3
|
Cao X, Lu P. DCSGMDA: A dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations. Comput Biol Chem 2024; 113:108201. [PMID: 39255626 DOI: 10.1016/j.compbiolchem.2024.108201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/17/2024] [Accepted: 08/31/2024] [Indexed: 09/12/2024]
Abstract
Numerous studies have shown that microRNAs (miRNAs) play a key role in human diseases as critical biomarkers. Its abnormal expression is often accompanied by the emergence of specific diseases. Therefore, studying the relationship between miRNAs and diseases can deepen the insights of their pathogenesis, grasp the process of disease onset and development, and promote drug research of specific diseases. However, many undiscovered relationships between miRNAs and diseases remain, significantly limiting research on miRNA-disease correlations. To explore more potential correlations, we propose a dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations (DCSGMDA). Firstly, we constructed similarity networks for miRNAs and diseases, as well as an association relationship network. Secondly, potential features were fully mined using stacked deep learning and gradient decomposition networks, along with dual-channel convolutional neural networks. Finally, correlations were scored by a multilayer perceptron. We performed 5-fold and 10-fold cross-validation experiments on DCSGMDA using two datasets based on the Human MicroRNA Disease Database (HMDD). Additionally, parametric, ablation, and comparative experiments, along with case studies, were conducted. The experimental results demonstrate that DCSGMDA performs well in predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Xu Cao
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| | - Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China.
| |
Collapse
|
4
|
Wang Y, Yin Z. Prediction of miRNA-disease association based on multisource inductive matrix completion. Sci Rep 2024; 14:27503. [PMID: 39528650 PMCID: PMC11555322 DOI: 10.1038/s41598-024-78212-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
MicroRNAs (miRNAs) are endogenous non-coding RNAs approximately 23 nucleotides in length, playing significant roles in various cellular processes. Numerous studies have shown that miRNAs are involved in the regulation of many human diseases. Accurate prediction of miRNA-disease associations is crucial for early diagnosis, treatment, and prognosis assessment of diseases. In this paper, we propose the Autoencoder Inductive Matrix Completion (AEIMC) model to identify potential miRNA-disease associations. The model captures interaction features from multiple similarity networks, including miRNA functional similarity, miRNA sequence similarity, disease semantic similarity, disease ontology similarity, and Gaussian interaction kernel similarity between miRNAs and diseases. Autoencoders are used to extract more complex and abstract data representations, which are then input into the inductive matrix completion model for association prediction. The effectiveness of the model is validated through cross-validation, stratified threshold evaluation, and case studies, while ablation experiments further confirm the necessity of introducing sequence and ontology similarities for the first time.
Collapse
Affiliation(s)
- YaWei Wang
- School of Mathematics, Physics and Statistics, Institute for Frontier Medical Technology, Center of Intelligent Computing and Applied Statistics, Shanghai University of Enginneering Science, Shanghai, 201620, China
| | - ZhiXiang Yin
- School of Mathematics, Physics and Statistics, Institute for Frontier Medical Technology, Center of Intelligent Computing and Applied Statistics, Shanghai University of Enginneering Science, Shanghai, 201620, China.
| |
Collapse
|
5
|
Liu W, Lan Z, Li Z, Sun X, Lu X. Dual-neighbourhood information aggregation and feature fusion for prediction of miRNA-disease association. Comput Biol Med 2024; 181:109068. [PMID: 39208505 DOI: 10.1016/j.compbiomed.2024.109068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/23/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Studying the intricate relationship between miRNAs and diseases is crucial to prevent and treat miRNA-related disorders. Existing computational methods often overlook the importance of features of different nodes and the propagation of features among heterogeneous nodes. Many prediction models focus only on the feature coding of miRNA and diseases and ignore the importance of feature aggregation. We propose a prediction method via dual-neighbourhood feature aggregation and feature fusion, which uses multiple sources of information, aggregates information on homogeneous and heterogeneous nodes and fuses learned features to predict multiple representations of disease nodes. We constructed similarity networks of multiple homogeneous nodes based on different similarity computation methods respectively, and fused the attention mechanism by using graph convolutional networks to obtain information of different levels of importance. To alleviate the problem of sparse connectivity in the dataset, we built a two-neighbourhood heterogeneous graph neural network model to integrate the homogeneous similarity network into a miRNA-disease heterogeneous network by using known miRNA-disease association information. We used the neighbourhood information associated with the nodes in the network to perform feature aggregation. In addition, we used a feature fusion module to learn the importance of different types of nodes to predict miRNA-disease associations. Our experimental results on the Human microRNA Disease Database (HMDD v3.2) show that the model demonstrates superior performance. This work demonstrates the capability of our model to identify potential miRNAs associated with diseases through a case study of two common cancers.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zixin Lan
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 421002, China
| | - Xingen Sun
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
| | - Xu Lu
- School of Computer Science, Guangdong Polytechnic Normal University, Guangdong Provincial Key Laboratory of Intellectual Property Big Data, Guangzhou 510665, China.
| |
Collapse
|
6
|
Lee Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. SENSORS (BASEL, SWITZERLAND) 2024; 24:5861. [PMID: 39338606 PMCID: PMC11435907 DOI: 10.3390/s24185861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/04/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024]
Abstract
Three-dimensional dense reconstruction involves extracting the full shape and texture details of three-dimensional objects from two-dimensional images. Although 3D reconstruction is a crucial and well-researched area, it remains an unsolved challenge in dynamic or complex environments. This work provides a comprehensive overview of classical 3D dense reconstruction techniques, including those based on geometric and optical models, as well as approaches leveraging deep learning. It also discusses the datasets used for deep learning and evaluates the performance and the strengths and limitations of deep learning methods on these datasets.
Collapse
Affiliation(s)
- Yangming Lee
- RoCAL Lab, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
7
|
Sun W, Zhang P, Zhang W, Xu J, Huang Y, Li L. Synchronous Mutual Learning Network and Asynchronous Multi-Scale Embedding Network for miRNA-Disease Association Prediction. Interdiscip Sci 2024; 16:532-553. [PMID: 38310628 DOI: 10.1007/s12539-023-00602-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024]
Abstract
MicroRNA (miRNA) serves as a pivotal regulator of numerous cellular processes, and the identification of miRNA-disease associations (MDAs) is crucial for comprehending complex diseases. Recently, graph neural networks (GNN) have made significant advancements in MDA prediction. However, these methods tend to learn one type of node representation from a single heterogeneous network, ignoring the importance of multiple network topologies and node attributes. Here, we propose SMDAP (Sequence hierarchical modeling-based Mirna-Disease Association Prediction framework), a novel GNN-based framework that incorporates multiple network topologies and various node attributes including miRNA seed and full-length sequences to predict potential MDAs. Specifically, SMDAP consists of two types of MDA representation: following a heterogeneous pattern, we construct a transfer learning-like synchronous mutual learning network to learn the first MDA representation in conjunction with the miRNA seed sequence. Meanwhile, following a homogeneous pattern, we design a subgraph-inspired asynchronous multi-scale embedding network to obtain the second MDA representation based on the miRNA full-length sequence. Subsequently, an adaptive fusion approach is designed to combine the two branches such that we can score the MDAs by the downstream classifier and infer novel MDAs. Comprehensive experiments demonstrate that SMDAP integrates the advantages of multiple network topologies and node attributes into two branch representations. Moreover, the area under the receiver operating characteristic curve is 0.9622 on DB1, which is a 5.06% increase from the baselines. The area under the precision-recall curve is 0.9777, which is a 7.33% increase from the baselines. In addition, case studies on three human cancers validated the predictive performance of SMDAP. Overall, SMDAP represents a powerful tool for MDA prediction.
Collapse
Affiliation(s)
- Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | | | - Li Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
8
|
Hou B, Yu D, Bai H, Du X. Research Progress of miRNA in Heart Failure: Prediction and Treatment. J Cardiovasc Pharmacol 2024; 84:136-145. [PMID: 38922572 DOI: 10.1097/fjc.0000000000001588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/23/2024] [Indexed: 06/27/2024]
Abstract
ABSTRACT This review summarizes the multiple roles of microRNAs (miRNAs) in the prediction and treatment of heart failure (HF), including the molecular mechanisms regulating cell apoptosis, myocardial fibrosis, cardiac hypertrophy, and ventricular remodeling, and highlights the importance of miRNAs in the prognosis of HF. In addition, the strategies for alleviating HF with miRNA intervention are discussed. On the basis of the challenges and emerging directions in the research and clinical practice of HF miRNAs, it is proposed that miRNA-based therapy could be a new approach for prevention and treatment of HF.
Collapse
Affiliation(s)
- Bingyan Hou
- Key Laboratory of Chinese Materia Medica, Ministry of Education, Pharmaceutical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | | | | | | |
Collapse
|
9
|
Guo Y, Yi M. THGNCDA: circRNA-disease association prediction based on triple heterogeneous graph network. Brief Funct Genomics 2024; 23:384-394. [PMID: 37738503 DOI: 10.1093/bfgp/elad042] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/04/2023] [Indexed: 09/24/2023] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNA molecules featuring a closed circular structure. They have been proved to play a significant role in the reduction of many diseases. Besides, many researches in clinical diagnosis and treatment of disease have revealed that circRNA can be considered as a potential biomarker. Therefore, understanding the association of circRNA and diseases can help to forecast some disorders of life activities. However, traditional biological experimental methods are time-consuming. The most common method for circRNA-disease association prediction on the basis of machine learning can avoid this, which relies on diverse data. Nevertheless, topological information of circRNA and disease usually is not involved in these methods. Moreover, circRNAs can be associated with diseases through miRNAs. With these considerations, we proposed a novel method, named THGNCDA, to predict the association between circRNAs and diseases. Specifically, for a certain pair of circRNA and disease, we employ a graph neural network with attention to learn the importance of its each neighbor. In addition, we use a multilayer convolutional neural network to explore the relationship of a circRNA-disease pair based on their attributes. When calculating embeddings, we introduce the information of miRNAs. The results of experiments show that THGNCDA outperformed the SOTA methods. In addition, it can be observed that our method gives a better recall rate. To confirm the significance of attention, we conducted extensive ablation studies. Case studies on Urinary Bladder and Prostatic Neoplasms further show THGNCDA's ability in discovering known relationships between circRNA candidates and diseases.
Collapse
Affiliation(s)
- Yuwei Guo
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| |
Collapse
|
10
|
Xuan P, Wang X, Cui H, Meng X, Nakaguchi T, Zhang T. Meta-Path Semantic and Global-Local Representation Learning Enhanced Graph Convolutional Model for Disease-Related miRNA Prediction. IEEE J Biomed Health Inform 2024; 28:4306-4316. [PMID: 38709611 DOI: 10.1109/jbhi.2024.3397003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Dysregulation of miRNAs is closely related to the progression of various diseases, so identifying disease-related miRNAs is crucial. Most recently proposed methods are based on graph reasoning, while they did not completely exploit the topological structure composed of the higher-order neighbor nodes and the global and local features of miRNA and disease nodes. We proposed a prediction method, MDAP, to learn semantic features of miRNA and disease nodes based on various meta-paths, as well as node features from the entire heterogeneous network perspective, and node pair attributes. Firstly, for both the miRNA and disease nodes, node category-wise meta-paths were constructed to integrate the similarity and association connection relationships. Each target node has its specific neighbor nodes for each meta-path, and the neighbors of longer meta-paths constitute its higher-order neighbor topological structure. Secondly, we constructed a meta-path specific graph convolutional network module to integrate the features of higher-order neighbors and their topology, and then learned the semantic representations of nodes. Thirdly, for the entire miRNA-disease heterogeneous network, a global-aware graph convolutional autoencoder was built to learn the network-view feature representations of nodes. We also designed semantic-level and representation-level attentions to obtain informative semantic features and node representations. Finally, the strategy based on the parallel convolutional-deconvolutional neural networks were designed to enhance the local feature learning for a pair of miRNA and disease nodes. The experiment results showed that MDAP outperformed other state-of-the-art methods, and the ablation experiments demonstrated the effectiveness of MDAP's major innovations. MDAP's ability in discovering potential disease-related miRNAs was further analyzed by the case studies over three diseases.
Collapse
|
11
|
Zhao BW, He YZ, Su XR, Yang Y, Li GD, Huang YA, Hu PW, You ZH, Hu L. Motif-Aware miRNA-Disease Association Prediction via Hierarchical Attention Network. IEEE J Biomed Health Inform 2024; 28:4281-4294. [PMID: 38557614 DOI: 10.1109/jbhi.2024.3383591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
As post-transcriptional regulators of gene expression, micro-ribonucleic acids (miRNAs) are regarded as potential biomarkers for a variety of diseases. Hence, the prediction of miRNA-disease associations (MDAs) is of great significance for an in-depth understanding of disease pathogenesis and progression. Existing prediction models are mainly concentrated on incorporating different sources of biological information to perform the MDA prediction task while failing to consider the fully potential utility of MDA network information at the motif-level. To overcome this problem, we propose a novel motif-aware MDA prediction model, namely MotifMDA, by fusing a variety of high- and low-order structural information. In particular, we first design several motifs of interest considering their ability to characterize how miRNAs are associated with diseases through different network structural patterns. Then, MotifMDA adopts a two-layer hierarchical attention to identify novel MDAs. Specifically, the first attention layer learns high-order motif preferences based on their occurrences in the given MDA network, while the second one learns the final embeddings of miRNAs and diseases through coupling high- and low-order preferences. Experimental results on two benchmark datasets have demonstrated the superior performance of MotifMDA over several state-of-the-art prediction models. This strongly indicates that accurate MDA prediction can be achieved by relying solely on MDA network information. Furthermore, our case studies indicate that the incorporation of motif-level structure information allows MotifMDA to discover novel MDAs from different perspectives.
Collapse
|
12
|
Dong B, Sun W, Xu D, Wang G, Zhang T. MDformer: A transformer-based method for predicting miRNA-Disease associations using multi-source feature fusion and maximal meta-path instances encoding. Comput Biol Med 2023; 167:107585. [PMID: 37890424 DOI: 10.1016/j.compbiomed.2023.107585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/15/2023] [Accepted: 10/15/2023] [Indexed: 10/29/2023]
Abstract
There is a growing body of evidence suggesting that microRNAs (miRNAs), small biological molecules, play a crucial role in the diagnosis, treatment, and prognostic assessment of diseases. However, it is often inefficient to verify the association between miRNAs and diseases (MDA) through traditional experimental methods. Based on this situation, researchers have proposed various computational-based methods, but the existing methods often have many drawbacks in terms of predictive effectiveness and accuracy. Therefore, in order to improve the prediction performance of computational methods, we propose a transformer-based prediction model (MDformer) for multi-source feature information. Specifically, first, we consider multiple features of miRNAs and diseases from the molecular biology perspective and utilize them in a fusion. Then high-quality node feature embeddings were generated using a feature encoder based on the transformer architecture and meta-path instances. Finally, a deep neural network was built for MDA prediction. To evaluate the performance of our model, we performed multiple 5-fold cross-validations as well as comparison experiments on HMDD v3.2 and HMDD v2.0 databases, and the experimental results of the average ROC area under the curve (AUC) were higher than the comparative methods for both databases at 0.9506 and 0.9369. We conducted case studies on five highly lethal cancers (breast, lung, colorectal, gastric, and hepatocellular cancers), and the first 30 predictions for these five diseases achieved 97.3% accuracy. In conclusion, MDformer is a reliable and scientifically sound tool that can be used to accurately predict MDA. In addition, the source code is available at https://github.com/Linda908/MDformer.
Collapse
Affiliation(s)
- Benzhi Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Weidong Sun
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Dali Xu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.
| | - Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
13
|
Lu Z, Zhong H, Tang L, Luo J, Zhou W, Liu L. Predicting lncRNA-disease associations based on heterogeneous graph convolutional generative adversarial network. PLoS Comput Biol 2023; 19:e1011634. [PMID: 38019786 PMCID: PMC10686445 DOI: 10.1371/journal.pcbi.1011634] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023] Open
Abstract
There is a growing body of evidence indicating the crucial roles that long non-coding RNAs (lncRNAs) play in the development and progression of various diseases, including cancers, cardiovascular diseases, and neurological disorders. However, accurately predicting potential lncRNA-disease associations remains a challenge, as existing methods have limitations in extracting heterogeneous association information and handling sparse and unbalanced data. To address these issues, we propose a novel computational method, called HGC-GAN, which combines heterogeneous graph convolutional neural networks (GCN) and generative adversarial networks (GAN) to predict potential lncRNA-disease associations. Specifically, we construct a lncRNA-miRNA-disease heterogeneous network by integrating multiple association data and sequence information. The GCN-based generator is then employed to aggregate neighbor information of nodes and obtain node embeddings, which are used to predict lncRNA-disease associations. Meanwhile, the GAN-based discriminator is trained to distinguish between real and fake lncRNA-disease associations generated by the generator, enabling the generator to improve its ability to generate accurate lncRNA-disease associations gradually. Our experimental results demonstrate that HGC-GAN performs better in predicting potential lncRNA-disease associations, with AUC and AUPR values of 0.9591 and 0.9606, respectively, under 10-fold cross-validation. Moreover, our case study further confirms the effectiveness of HGC-GAN in predicting potential lncRNA-disease associations, even for novel lncRNAs without any known lncRNA-disease associations. Overall, our proposed method HGC-GAN provides a promising approach to predict potential lncRNA-disease associations and may have important implications for disease diagnosis, treatment, and drug development.
Collapse
Affiliation(s)
- Zhonghao Lu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Hua Zhong
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Lin Tang
- Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource in Yunnan, School of Life Sciences and School of Ecology and Environment, Yunnan University, Kunming, People’s Republic of China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, People’s Republic of China
| | - Lin Liu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| |
Collapse
|
14
|
Dong B, Sun W, Xu D, Wang G, Zhang T. DAEMDA: A Method with Dual-Channel Attention Encoding for miRNA-Disease Association Prediction. Biomolecules 2023; 13:1514. [PMID: 37892196 PMCID: PMC10604960 DOI: 10.3390/biom13101514] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/08/2023] [Indexed: 10/29/2023] Open
Abstract
A growing number of studies have shown that aberrant microRNA (miRNA) expression is closely associated with the evolution and development of various complex human diseases. These key biomarkers' identification and observation are significant for gaining a deeper understanding of disease pathogenesis and therapeutic mechanisms. Consequently, pinpointing potential miRNA-disease associations (MDA) has become a prominent bioinformatics subject, encouraging several new computational methods given the advances in graph neural networks (GNN). Nevertheless, these existing methods commonly fail to exploit the network nodes' global feature information, leaving the generation of high-quality embedding representations using graph properties as a critical unsolved issue. Addressing these challenges, we introduce the DAEMDA, a computational method designed to optimize the current models' efficacy. First, we construct similarity and heterogeneous networks involving miRNAs and diseases, relying on experimentally corroborated miRNA-disease association data and analogous information. Then, a newly-fashioned parallel dual-channel feature encoder, designed to better comprehend the global information within the heterogeneous network and generate varying embedding representations, follows this. Ultimately, employing a neural network classifier, we merge the dual-channel embedding representations and undertake association predictions between miRNA and disease nodes. The experimental results of five-fold cross-validation and case studies of major diseases based on the HMDD v3.2 database show that this method can generate high-quality embedded representations and effectively improve the accuracy of MDA prediction.
Collapse
Affiliation(s)
| | | | | | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (B.D.)
| | - Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (B.D.)
| |
Collapse
|
15
|
Zhao H, Duan G, Ni P, Yan C, Li Y, Wang J. RNPredATC: A Deep Residual Learning-Based Model With Applications to the Prediction of Drug-ATC Code Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2712-2723. [PMID: 34110998 DOI: 10.1109/tcbb.2021.3088256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The Anatomical Therapeutic Chemical (ATC) classification system, designated by the World Health Organization Collaborating Center (WHOCC), has been widely used in drug screening, repositioning, and similarity research. The ATC classification system assigns different codes to drugs according to the organ or system on which they act and/or their therapeutic and chemical characteristics. Correctly identifying the potential ATC codes for drugs can accelerate drug development and reduce the cost of experiments. Several classifiers have been proposed in this regard. However, they lack of ability to learn basic features from sparsely known drug-ATC code associations. Therefore, there is an urgent need for novel computational methods to precisely predict potential drug-ATC code associations in multiple levels of the ATC classification system based on known associations between drugs and ATC codes. In this paper, we provide a novel end-to-end model, so-called RNPredATC, to predict potential drug-ATC code associations in five ATC classification levels. RNPredATC can extract dense feature vectors from sparsely known drug-ATC code associations and reduce the impact from the degradation problem by a novel deep residual learning. We extensively compare our method with some state-of-the-art methods, including NetPredATC, SPACE, and some multi-label-based methods. Our experimental results show that RNPredATC achieves better performances in five-fold and ten-fold cross validations. Furthermore, the visualization analysis of hidden layers and case studies of predicted associations at the fifth ATC classification level confirm that RNPredATC can effectively identify the potential ATC codes of drugs.
Collapse
|
16
|
Wang MN, Li Y, Lei LL, Ding DW, Xie XJ. Combining non-negative matrix factorization with graph Laplacian regularization for predicting drug-miRNA associations based on multi-source information fusion. Front Pharmacol 2023; 14:1132012. [PMID: 36817132 PMCID: PMC9931722 DOI: 10.3389/fphar.2023.1132012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023] Open
Abstract
Increasing evidences suggest that miRNAs play a key role in the occurrence and progression of many complex human diseases. Therefore, targeting dysregulated miRNAs with small molecule drugs in the clinical has become a new treatment. Nevertheless, it is high cost and time-consuming for identifying miRNAs-targeted with drugs by biological experiments. Thus, more reliable computational method for identification associations of drugs with miRNAs urgently need to be developed. In this study, we proposed an efficient method, called GNMFDMA, to predict potential associations of drug with miRNA by combining graph Laplacian regularization with non-negative matrix factorization. We first calculated the overall similarity matrices of drugs and miRNAs according to the collected different biological information. Subsequently, the new drug-miRNA association adjacency matrix was reformulated based on the K nearest neighbor profiles so as to put right the false negative associations. Finally, graph Laplacian regularization collaborative non-negative matrix factorization was used to calculate the association scores of drugs with miRNAs. In the cross validation, GNMFDMA obtains AUC of 0.9193, which outperformed the existing methods. In addition, case studies on three common drugs (i.e., 5-Aza-CdR, 5-FU and Gemcitabine), 30, 31 and 34 of the top-50 associations inferred by GNMFDMA were verified. These results reveal that GNMFDMA is a reliable and efficient computational approach for identifying the potential drug-miRNA associations.
Collapse
Affiliation(s)
- Mei-Neng Wang
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Yu Li
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, China,*Correspondence: Yu Li,
| | - Li-Lan Lei
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - De-Wu Ding
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Xue-Jun Xie
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| |
Collapse
|
17
|
He Y, Yang Y, Su X, Zhao B, Xiong S, Hu L. Incorporating higher order network structures to improve miRNA-disease association prediction based on functional modularity. Brief Bioinform 2023; 24:6958503. [PMID: 36562706 DOI: 10.1093/bib/bbac562] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/29/2022] [Accepted: 11/19/2022] [Indexed: 12/24/2022] Open
Abstract
As microRNAs (miRNAs) are involved in many essential biological processes, their abnormal expressions can serve as biomarkers and prognostic indicators to prevent the development of complex diseases, thus providing accurate early detection and prognostic evaluation. Although a number of computational methods have been proposed to predict miRNA-disease associations (MDAs) for further experimental verification, their performance is limited primarily by the inadequacy of exploiting lower order patterns characterizing known MDAs to identify missing ones from MDA networks. Hence, in this work, we present a novel prediction model, namely HiSCMDA, by incorporating higher order network structures for improved performance of MDA prediction. To this end, HiSCMDA first integrates miRNA similarity network, disease similarity network and MDA network to preserve the advantages of all these networks. After that, it identifies overlapping functional modules from the integrated network by predefining several higher order connectivity patterns of interest. Last, a path-based scoring function is designed to infer potential MDAs based on network paths across related functional modules. HiSCMDA yields the best performance across all datasets and evaluation metrics in the cross-validation and independent validation experiments. Furthermore, in the case studies, 49 and 50 out of the top 50 miRNAs, respectively, predicted for colon neoplasms and lung neoplasms have been validated by well-established databases. Experimental results show that rich higher order organizational structures exposed in the MDA network gain new insight into the MDA prediction based on higher order connectivity patterns.
Collapse
Affiliation(s)
- Yizhou He
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Yue Yang
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Shengwu Xiong
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
18
|
Zheng K, Zhang XL, Wang L, You ZH, Ji BY, Liang X, Li ZW. SPRDA: a link prediction approach based on the structural perturbation to infer disease-associated Piwi-interacting RNAs. Brief Bioinform 2023; 24:6850564. [PMID: 36445194 DOI: 10.1093/bib/bbac498] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 11/30/2022] Open
Abstract
piRNA and PIWI proteins have been confirmed for disease diagnosis and treatment as novel biomarkers due to its abnormal expression in various cancers. However, the current research is not strong enough to further clarify the functions of piRNA in cancer and its underlying mechanism. Therefore, how to provide large-scale and serious piRNA candidates for biological research has grown up to be a pressing issue. In this study, a novel computational model based on the structural perturbation method is proposed to predict potential disease-associated piRNAs, called SPRDA. Notably, SPRDA belongs to positive-unlabeled learning, which is unaffected by negative examples in contrast to previous approaches. In the 5-fold cross-validation, SPRDA shows high performance on the benchmark dataset piRDisease, with an AUC of 0.9529. Furthermore, the predictive performance of SPRDA for 10 diseases shows the robustness of the proposed method. Overall, the proposed approach can provide unique insights into the pathogenesis of the disease and will advance the field of oncology diagnosis and treatment.
Collapse
Affiliation(s)
- Kai Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China.,College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China
| | - Xin-Lu Zhang
- Civil Product General Research Institute, The 36th Research Institute of China Electronics Technology Group Corporation, Jiaxing, 314000, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410006, China
| | - Xiao Liang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Zheng-Wei Li
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning, 530007, China
| |
Collapse
|
19
|
Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 2023; 14:1170559. [PMID: 37187536 PMCID: PMC10175670 DOI: 10.3389/fmicb.2023.1170559] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/21/2023] [Indexed: 05/17/2023] Open
Abstract
MicroRNAs (miRNAs) are short RNA molecular fragments that regulate gene expression by targeting and inhibiting the expression of specific RNAs. Due to the fact that microRNAs affect many diseases in microbial ecology, it is necessary to predict microRNAs' association with diseases at the microbial level. To this end, we propose a novel model, termed as GCNA-MDA, where dual-autoencoder and graph convolutional network (GCN) are integrated to predict miRNA-disease association. The proposed method leverages autoencoders to extract robust representations of miRNAs and diseases and meantime exploits GCN to capture the topological information of miRNA-disease networks. To alleviate the impact of insufficient information for the original data, the association similarity and feature similarity data are combined to calculate a more complete initial basic vector of nodes. The experimental results on the benchmark datasets demonstrate that compared with the existing representative methods, the proposed method has achieved the superior performance and its precision reaches up to 0.8982. These results demonstrate that the proposed method can serve as a tool for exploring miRNA-disease associations in microbial environments.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yuxiang Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Zihang Li
- School of Computing and Data Science, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- *Correspondence: Hao Chen
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
- Linlin Zhuo
| |
Collapse
|
20
|
Zhang ML, Zhao BW, Su XR, He YZ, Yang Y, Hu L. RLFDDA: a meta-path based graph representation learning model for drug-disease association prediction. BMC Bioinformatics 2022; 23:516. [PMID: 36456957 PMCID: PMC9713188 DOI: 10.1186/s12859-022-05069-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Drug repositioning is a very important task that provides critical information for exploring the potential efficacy of drugs. Yet developing computational models that can effectively predict drug-disease associations (DDAs) is still a challenging task. Previous studies suggest that the accuracy of DDA prediction can be improved by integrating different types of biological features. But how to conduct an effective integration remains a challenging problem for accurately discovering new indications for approved drugs. METHODS In this paper, we propose a novel meta-path based graph representation learning model, namely RLFDDA, to predict potential DDAs on heterogeneous biological networks. RLFDDA first calculates drug-drug similarities and disease-disease similarities as the intrinsic biological features of drugs and diseases. A heterogeneous network is then constructed by integrating DDAs, disease-protein associations and drug-protein associations. With such a network, RLFDDA adopts a meta-path random walk model to learn the latent representations of drugs and diseases, which are concatenated to construct joint representations of drug-disease associations. As the last step, we employ the random forest classifier to predict potential DDAs with their joint representations. RESULTS To demonstrate the effectiveness of RLFDDA, we have conducted a series of experiments on two benchmark datasets by following a ten-fold cross-validation scheme. The results show that RLFDDA yields the best performance in terms of AUC and F1-score when compared with several state-of-the-art DDAs prediction models. We have also conducted a case study on two common diseases, i.e., paclitaxel and lung tumors, and found that 7 out of top-10 diseases and 8 out of top-10 drugs have already been validated for paclitaxel and lung tumors respectively with literature evidence. Hence, the promising performance of RLFDDA may provide a new perspective for novel DDAs discovery over heterogeneous networks.
Collapse
Affiliation(s)
- Meng-Long Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Yi-Zhou He
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Yue Yang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| |
Collapse
|
21
|
Zheng K, Zhang XL, Wang L, You ZH, Zhan ZH, Li HY. Line graph attention networks for predicting disease-associated Piwi-interacting RNAs. Brief Bioinform 2022; 23:6748487. [PMID: 36198846 DOI: 10.1093/bib/bbac393] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/08/2022] [Accepted: 08/12/2022] [Indexed: 12/14/2022] Open
Abstract
PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.
Collapse
Affiliation(s)
- Kai Zheng
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | | | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhao-Hui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Hao-Yuan Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
22
|
Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: towards systematic evaluation of computational models. Brief Bioinform 2022; 23:6712303. [PMID: 36151749 DOI: 10.1093/bib/bbac407] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 08/11/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022] Open
Abstract
Currently, there exist no generally accepted strategies of evaluating computational models for microRNA-disease associations (MDAs). Though K-fold cross validations and case studies seem to be must-have procedures, the value of K, the evaluation metrics, and the choice of query diseases as well as the inclusion of other procedures (such as parameter sensitivity tests, ablation studies and computational cost reports) are all determined on a case-by-case basis and depending on the researchers' choices. In the current review, we include a comprehensive analysis on how 29 state-of-the-art models for predicting MDAs were evaluated. Based on the analytical results, we recommend a feasible evaluation workflow that would suit any future model to facilitate fair and systematic assessment of predictive performance.
Collapse
Affiliation(s)
- Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
23
|
Li Y, Hu XG, Wang L, Li PP, You ZH. MNMDCDA: prediction of circRNA-disease associations by learning mixed neighborhood information from multiple distances. Brief Bioinform 2022; 23:6831006. [PMID: 36384071 DOI: 10.1093/bib/bbac479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/25/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
Emerging evidence suggests that circular RNA (circRNA) is an important regulator of a variety of pathological processes and serves as a promising biomarker for many complex human diseases. Nevertheless, there are relatively few known circRNA-disease associations, and uncovering new circRNA-disease associations by wet-lab methods is time consuming and costly. Considering the limitations of existing computational methods, we propose a novel approach named MNMDCDA, which combines high-order graph convolutional networks (high-order GCNs) and deep neural networks to infer associations between circRNAs and diseases. Firstly, we computed different biological attribute information of circRNA and disease separately and used them to construct multiple multi-source similarity networks. Then, we used the high-order GCN algorithm to learn feature embedding representations with high-order mixed neighborhood information of circRNA and disease from the constructed multi-source similarity networks, respectively. Finally, the deep neural network classifier was implemented to predict associations of circRNAs with diseases. The MNMDCDA model obtained AUC scores of 95.16%, 94.53%, 89.80% and 91.83% on four benchmark datasets, i.e., CircR2Disease, CircAtlas v2.0, Circ2Disease and CircRNADisease, respectively, using the 5-fold cross-validation approach. Furthermore, 25 of the top 30 circRNA-disease pairs with the best scores of MNMDCDA in the case study were validated by recent literature. Numerous experimental results indicate that MNMDCDA can be used as an effective computational tool to predict circRNA-disease associations and can provide the most promising candidates for biological experiments.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Xue-Gang Hu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Pei-Pei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
| |
Collapse
|
24
|
Li W, Wang S, Xu J, Xiang J. Inferring Latent MicroRNA-Disease Associations on a Gene-Mediated Tripartite Heterogeneous Multiplexing Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3190-3201. [PMID: 35041612 DOI: 10.1109/tcbb.2022.3143770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules encoded by endogenous genes with a length of about 22 nucleotides. MiRNAs have been successfully identified as differentially expressed in various cancers. There is evidence that disorders of miRNAs are associated with a variety of complex diseases. Therefore, inferring potential miRNA-disease associations (MDAs) is very important for understanding the aetiology and pathogenesis of many diseases and is useful to disease diagnosis, prognosis and treatment. First, We creatively fused multiple similarity subnetworks from multi-sources for miRNAs, genes and diseases by multiplexing technology, respectively. Then, three multiplexed biological subnetworks are connected through the extended binary association to form a tripartite complete heterogeneous multiplexed network (Tri-HM). Finally, because the constructed Tri-HM network can retain subnetworks' original topology and biological functions and expands the binary association and dependence between the three biological entities, rich neighbourhood information is obtained iteratively from neighbours by a non-equilibrium random walk. Through cross-validation, our tri-HM-RWR model obtained an AUC value of 0.8657, and an AUPR value of 0.2139 in the global 5-fold cross-validation, which shows that our model can more fully speculate disease-related miRNAs.
Collapse
|
25
|
Cao B, Li R, Xiao S, Deng S, Zhou X, Zhou L. Predicting miRNA-disease association through combining miRNA function and network topological similarities based on MINE. iScience 2022; 25:105299. [DOI: 10.1016/j.isci.2022.105299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/08/2022] [Accepted: 09/28/2022] [Indexed: 11/16/2022] Open
|
26
|
Dong TN, Schrader J, Mücke S, Khosla M. A message passing framework with multiple data integration for miRNA-disease association prediction. Sci Rep 2022; 12:16259. [PMID: 36171337 PMCID: PMC9519928 DOI: 10.1038/s41598-022-20529-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/14/2022] [Indexed: 11/08/2022] Open
Abstract
Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach's superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.
Collapse
Affiliation(s)
- Thi Ngan Dong
- L3S Research Center, Leibniz University of Hannover, Hannover, Germany.
| | - Johanna Schrader
- L3S Research Center, Leibniz University of Hannover, Hannover, Germany
| | - Stefanie Mücke
- Hannover Unified Biobank (HUB), Hannover Medical School, Hannover, Germany
| | - Megha Khosla
- Delft University of Technology (TU Delft), Delft, Netherlands
| |
Collapse
|
27
|
Yu CQ, Wang XF, Li LP, You ZH, Huang WZ, Li YC, Ren ZH, Guan YJ. SGCNCMI: A New Model Combining Multi-Modal Information to Predict circRNA-Related miRNAs, Diseases and Genes. BIOLOGY 2022; 11:biology11091350. [PMID: 36138829 PMCID: PMC9495879 DOI: 10.3390/biology11091350] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/21/2022] [Accepted: 09/08/2022] [Indexed: 11/16/2022]
Abstract
Computational prediction of miRNAs, diseases, and genes associated with circRNAs has important implications for circRNA research, as well as provides a reference for wet experiments to save costs and time. In this study, SGCNCMI, a computational model combining multimodal information and graph convolutional neural networks, combines node similarity to form node information and then predicts associated nodes using GCN with a distributive contribution mechanism. The model can be used not only to predict the molecular level of circRNA–miRNA interactions but also to predict circRNA–cancer and circRNA–gene associations. The AUCs of circRNA—miRNA, circRNA–disease, and circRNA–gene associations in the five-fold cross-validation experiment of SGCNCMI is 89.42%, 84.18%, and 82.44%, respectively. SGCNCMI is one of the few models in this field and achieved the best results. In addition, in our case study, six of the top ten relationship pairs with the highest prediction scores were verified in PubMed.
Collapse
Affiliation(s)
- Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an 710123, China
- Correspondence:
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi 830052, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Wen-Zhun Huang
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an 710123, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an 710123, China
| |
Collapse
|
28
|
Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Brief Bioinform 2022; 23:6686738. [PMID: 36056743 DOI: 10.1093/bib/bbac358] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/24/2022] [Accepted: 07/30/2022] [Indexed: 12/12/2022] Open
Abstract
Since the problem proposed in late 2000s, microRNA-disease association (MDA) predictions have been implemented based on the data fusion paradigm. Integrating diverse data sources gains a more comprehensive research perspective, and brings a challenge to algorithm design for generating accurate, concise and consistent representations of the fused data. After more than a decade of research progress, a relatively simple algorithm like the score function or a single computation layer may no longer be sufficient for further improving predictive performance. Advanced model design has become more frequent in recent years, particularly in the form of reasonably combing multiple algorithms, a process known as model fusion. In the current review, we present 29 state-of-the-art models and introduce the taxonomy of computational models for MDA prediction based on model fusion and non-fusion. The new taxonomy exhibits notable changes in the algorithmic architecture of models, compared with that of earlier ones in the 2017 review by Chen et al. Moreover, we discuss the progresses that have been made towards overcoming the obstacles to effective MDA prediction since 2017 and elaborated on how future models can be designed according to a set of new schemas. Lastly, we analysed the strengths and weaknesses of each model category in the proposed taxonomy and proposed future research directions from diverse perspectives for enhancing model performance.
Collapse
Affiliation(s)
- Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
29
|
Han G, Kuang Z, Deng L. MSCNE:Predict miRNA-Disease Associations Using Neural Network Based on Multi-Source Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2926-2937. [PMID: 34410928 DOI: 10.1109/tcbb.2021.3106006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The important role of microRNA (miRNA) in human diseases has been confirmed by some studies. However, only using biological experiments has greater blindness, leading to higher experimental costs. In this paper a high-efficiency algorithm based on a variety of biological source information and applying a combination of a convolutional neural network (CNN) feature extractor and an extreme learning machine (ELM) classifier is proposed. Specifically, the semantic similarity of diseases, the gaussian interaction profile kernel similarity of the four biological information of miRNA, disease, long non-coding RNA (lncRNA) and environmental factors (EFs), and the similarities of miRNAs are fused together. Among them, miRNAs similarity is composed of miRNA target information, sequence information, family information, and function information. Then, the dimensionality of the data set is reduced by the autoencoder (AE). Finally, deep features are extracted through CNN, and then the association between miRNA and disease is predicted by ELM. The experimental results show that the average AUC value based on the multi-biological source information (MSCNE) model is 0.9630, which can reach higher performance than the other classic classifier, feature extractor mentioned and the other existing algorithms. The results show the MSCNE algorithm is effective to predict the correlation of miRNA-disease.
Collapse
|
30
|
Ma M, Na S, Zhang X, Chen C, Xu J. SFGAE: a self-feature-based graph autoencoder model for miRNA-disease associations prediction. Brief Bioinform 2022; 23:6678419. [PMID: 36037084 DOI: 10.1093/bib/bbac340] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 07/21/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Increasing evidence has suggested that microRNAs (miRNAs) are important biomarkers of various diseases. Numerous graph neural network (GNN) models have been proposed for predicting miRNA-disease associations. However, the existing GNN-based methods have over-smoothing issue-the learned feature embeddings of miRNA nodes and disease nodes are indistinguishable when stacking multiple GNN layers. This issue makes the performance of the methods sensitive to the number of layers, and significantly hurts the performance when more layers are employed. In this study, we resolve this issue by a novel self-feature-based graph autoencoder model, shortened as SFGAE. The key novelty of SFGAE is to construct miRNA-self embeddings and disease-self embeddings, and let them be independent of graph interactions between two types of nodes. The novel self-feature embeddings enrich the information of typical aggregated feature embeddings, which aggregate the information from direct neighbors and hence heavily rely on graph interactions. SFGAE adopts a graph encoder with attention mechanism to concatenate aggregated feature embeddings and self-feature embeddings, and adopts a bilinear decoder to predict links. Our experiments show that SFGAE achieves state-of-the-art performance. In particular, SFGAE improves the average AUC upon recent GAEMDA [1] on the benchmark datasets HMDD v2.0 and HMDD v3.2, and consistently performs better when less (e.g. 10%) training samples are used. Furthermore, SFGAE effectively overcomes the over-smoothing issue and performs stably well on deeper models (e.g. eight layers). Finally, we carry out case studies on three human diseases, colon neoplasms, esophageal neoplasms and kidney neoplasms, and perform a survival analysis using kidney neoplasm as an example. The results suggest that SFGAE is a reliable tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Mingyuan Ma
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Sen Na
- International Computer Science Institute and Department of Statistics, University of California, Berkeley, Berkeley CA, USA
| | - Xiaolu Zhang
- Department of Information Systems, City University of Hong Kong, Hong Kong, China
| | - Congzhou Chen
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| | - Jin Xu
- Key Laboratory of High Confidence Software Technologies of Ministry of Education, School of Computer Science, Peking University, Beijing, China
| |
Collapse
|
31
|
Zheng K, Liang Y, Liu YY, Yasir M, Wang P. A decision support system based on multi-sources information to predict piRNA–disease associations using stacked autoencoder. Soft comput 2022. [DOI: 10.1007/s00500-022-07396-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Yang M, Huang ZA, Gu W, Han K, Pan W, Yang X, Zhu Z. Prediction of biomarker-disease associations based on graph attention network and text representation. Brief Bioinform 2022; 23:6651308. [PMID: 35901464 DOI: 10.1093/bib/bbac298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. RESULTS Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. AVAILABILITY The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| | - Zhi-An Huang
- Center for Computer Science and Information Technology, City University of Hong Kong Dongguan Research Institute, Dongguan, China
| | - Wenhao Gu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China.,GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Kun Han
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Wenying Pan
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Xiao Yang
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| |
Collapse
|
33
|
A hybrid approach for the detection and monitoring of people having personality disorders on social networks. SOCIAL NETWORK ANALYSIS AND MINING 2022; 12:67. [PMID: 35789887 PMCID: PMC9244050 DOI: 10.1007/s13278-022-00884-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 03/11/2022] [Accepted: 05/03/2022] [Indexed: 11/19/2022]
Abstract
Research in the medical field does not stop evolving. This evolution obliges doctors to be up-to-date in order to well manage every situation that may occur with their patients. However, the medical field is very sensitive and requires a great deal of precision, all of that poses a major problem. Consequently, there is a recourse to computer science, to resolve all of these issues. In this context, we propose in this paper an architecture, taking advantage of artificial intelligence (AI) and text mining techniques to: (i) identify individuals with personality disorder from their textual production on social networks by classifying their set of tweets into distinct classes representing respectively the presence, the category and the type of the disease and (ii) guarantee personalized monitoring by filtering inappropriate tweets according to patient’s circumstance. The first phase was achieved by taking advantage of a deep neuronal approach that benefits of: (i) CNN layers for features extraction from the textual part, (ii) two LSTM layers to preserve long-term dependencies between different lexical units, (iii) SVM classifier to detect the sick person using the dependency links found from the previous layer. The second phase was accomplished by applying a hybrid approach that combined linguistic and statistical techniques in order to filter inappropriate tweets according to the state of each patient. Following the evaluation of our approach, we acquire an F-measure rate equivalent to 84% for the detection of personality disorder, 64% for the detection of the type of disease and 70% for the task of filtering inappropriate content. The obtained results are motivating and may encourage researchers to improve them in view of the interest and the importance of this research axis.
Collapse
|
34
|
Ji BY, Pan LR, Zhou JR, You ZH, Peng SL. SMMDA: Predicting miRNA-Disease Associations by Incorporating Multiple Similarity Profiles and a Novel Disease Representation. BIOLOGY 2022; 11:biology11050777. [PMID: 35625505 PMCID: PMC9138858 DOI: 10.3390/biology11050777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/17/2022] [Accepted: 05/17/2022] [Indexed: 12/24/2022]
Abstract
Simple Summary Predicting possible associations between miRNAs and diseases would provide new perspectives on disease diagnosis, pathogenesis, and gene therapy. In this work, considering the limited accessibility, high time consumption and high cost in traditional biological researches, we presented a novel computational method called SMMDA by incorporating multiple similarity profiles and a novel disease rep-resentation to accelerate the identification of potential miRNA-disease associations. SMMDA was intended to be useful for the prediction of associations between miRNAs and diseases, and to be effective for prevention, diagnosis, treatment and prognosis of Human diseases. Abstract Increasing evidence has suggested that microRNAs (miRNAs) are significant in research on human diseases. Predicting possible associations between miRNAs and diseases would provide new perspectives on disease diagnosis, pathogenesis, and gene therapy. However, considering the intrinsic time-consuming and expensive cost of traditional Vitro studies, there is an urgent need for a computational approach that would allow researchers to identify potential associations between miRNAs and diseases for further research. In this paper, we presented a novel computational method called SMMDA to predict potential miRNA-disease associations. In particular, SMMDA first utilized a new disease representation method (MeSHHeading2vec) based on the network embedding algorithm and then fused it with Gaussian interaction profile kernel similarity information of miRNAs and diseases, disease semantic similarity, and miRNA functional similarity. Secondly, SMMDA utilized a deep auto-coder network to transform the original features further to achieve a better feature representation. Finally, the ensemble learning model, XGBoost, was used as the underlying training and prediction method for SMMDA. In the results, SMMDA acquired a mean accuracy of 86.68% with a standard deviation of 0.42% and a mean AUC of 94.07% with a standard deviation of 0.23%, outperforming many previous works. Moreover, we also compared the predictive ability of SMMDA with different classifiers and different feature descriptors. In the case studies of three common Human diseases, the top 50 candidate miRNAs have 47 (esophageal neoplasms), 48 (breast neoplasms), and 48 (colon neoplasms) are successfully verified by two other databases. The experimental results proved that SMMDA has a reliable prediction ability in predicting potential miRNA-disease associations. Therefore, it is anticipated that SMMDA could be an effective tool for biomedical researchers.
Collapse
Affiliation(s)
- Bo-Ya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410200, China; (B.-Y.J.); (L.-R.P.)
| | - Liang-Rui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410200, China; (B.-Y.J.); (L.-R.P.)
| | - Ji-Ren Zhou
- College of Computer Science, Northwestern Polytechnic University, Xi’an 710072, China;
| | - Zhu-Hong You
- College of Computer Science, Northwestern Polytechnic University, Xi’an 710072, China;
- Correspondence: (Z.-H.Y.); (S.-L.P.)
| | - Shao-Liang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410200, China; (B.-Y.J.); (L.-R.P.)
- Correspondence: (Z.-H.Y.); (S.-L.P.)
| |
Collapse
|
35
|
Liang Y, Zhang ZQ, Liu NN, Wu YN, Gu CL, Wang YL. MAGCNSE: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinformatics 2022; 23:189. [PMID: 35590258 PMCID: PMC9118755 DOI: 10.1186/s12859-022-04715-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/05/2022] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Many long non-coding RNAs (lncRNAs) have key roles in different human biologic processes and are closely linked to numerous human diseases, according to cumulative evidence. Predicting potential lncRNA-disease associations can help to detect disease biomarkers and perform disease analysis and prevention. Establishing effective computational methods for lncRNA-disease association prediction is critical. RESULTS In this paper, we propose a novel model named MAGCNSE to predict underlying lncRNA-disease associations. We first obtain multiple feature matrices from the multi-view similarity graphs of lncRNAs and diseases utilizing graph convolutional network. Then, the weights are adaptively assigned to different feature matrices of lncRNAs and diseases using the attention mechanism. Next, the final representations of lncRNAs and diseases is acquired by further extracting features from the multi-channel feature matrices of lncRNAs and diseases using convolutional neural network. Finally, we employ a stacking ensemble classifier, consisting of multiple traditional machine learning classifiers, to make the final prediction. The results of ablation studies in both representation learning methods and classification methods demonstrate the validity of each module. Furthermore, we compare the overall performance of MAGCNSE with that of six other state-of-the-art models, the results show that it outperforms the other methods. Moreover, we verify the effectiveness of using multi-view data of lncRNAs and diseases. Case studies further reveal the outstanding ability of MAGCNSE in the identification of potential lncRNA-disease associations. CONCLUSIONS The experimental results indicate that MAGCNSE is a useful approach for predicting potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ze-Qun Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Nian-Nian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ya-Nan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Chang-Long Gu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Ying-Long Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
36
|
Yu N, Liu ZP, Gao R. Predicting multiple types of MicroRNA-disease associations based on tensor factorization and label propagation. Comput Biol Med 2022; 146:105558. [PMID: 35525071 DOI: 10.1016/j.compbiomed.2022.105558] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/06/2022] [Accepted: 04/22/2022] [Indexed: 12/29/2022]
Abstract
MicroRNAs (miRNAs) play important regulatory roles in the pathogenesis and progression of diseases. Most existing bioinformatics methods only study miRNA-disease binary association prediction. However, there are many types of associations between miRNA and disease. In addition, the miRNA-disease-type association dataset has inherent noise and incompleteness. In this paper, a novel method based on tensor factorization and label propagation (TFLP) is proposed to alleviate the above problems. First, as an effective tensor factorization method, tensor robust principal component analysis (TRPCA) is applied to the original multiple-type miRNA-disease associations to obtain a clean and complete low-rank prediction tensor. Second, the Gaussian interaction profile (GIP) kernel is used to describe the similarity of disease pairs and the similarity of miRNA pairs. Then, they are combined with disease semantic similarity and miRNA functional similarity to obtain an integrated disease similarity network and an integrated miRNA similarity network, respectively. Finally, the low-rank association tensor and the biological similarity as auxiliary information are introduced into label propagation. The prediction performance of the algorithm is improved by iterative propagation of labeled information to unlabeled samples. Extensive experiments reveal that the proposed TFLP method outperforms other state-of-the-art methods for predicting multiple types of miRNA-disease associations. The data and source codes are available at https://github.com/nayu0419/TFLP.
Collapse
Affiliation(s)
- Na Yu
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| | - Zhi-Ping Liu
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| |
Collapse
|
37
|
Xie W, Zheng Z, Zhang W, Huang L, Lin Q, Wong KC. SRG-vote: Predicting miRNA-gene relationships via embedding and LSTM ensemble. IEEE J Biomed Health Inform 2022; 26:4335-4344. [PMID: 35471879 DOI: 10.1109/jbhi.2022.3169542] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractTargeted therapy for one for a set of genes has made it possible to apply precision medicine for different patients due to the existence of tumor heterogeneity. However, how to regulate those genes are still problematic. One of the natural regulators of genes is microRNAs. Thus, a better understanding of the miRNA-gene interaction mechanism might contribute to future diagnosis, prevention, and cancer therapy. The interactions between microRNA and genes play an essential role in molecular genetics. The in-vivo experiments validating the relationships between them are time-consuming, money-costly, and labor-intensive. With the development of high-throughput technology, we dealt with tons of biological data. However, extracting features from tremendous raw data and making a mathematical model is still a challenging topic. Machine learning and deep learning algorithms have become powerful tools in dealing with biological data. Inspired by this, in this paper, we propose a model that combines features/embedding extraction methods, deep learning algorithms, and a voting system. We leverage doc2vec to generate sequential embedding from molecular sequences. The role2vec, GCN, and GMM for geometrical embedding were generated from the complex network from similarity and pair-wise datasets. For the deep learning algorithms, we leveraged LSTM and Bi-LSTM according to different embedding and features. Finally, we adopted a voting system to balance results from different data sources. The results have shown that our voting system could achieve a higher AUC than the existing benchmark. The case studies demonstrate that our model could reveal potential relationships between miRNAs and genes. The source code, features, and predictive results can be downloaded at https://github.com/Xshelton/SRG-vote.
Collapse
|
38
|
Yan C, Duan G, Li N, Zhang L, Wu FX, Wang J. PDMDA: predicting deep-level miRNA-disease associations with graph neural networks and sequence features. Bioinformatics 2022; 38:2226-2234. [PMID: 35150255 DOI: 10.1093/bioinformatics/btac077] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 01/18/2022] [Accepted: 02/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Many studies have shown that microRNAs (miRNAs) play a key role in human diseases. Meanwhile, traditional experimental methods for miRNA-disease association identification are extremely costly, time-consuming and challenging. Therefore, many computational methods have been developed to predict potential associations between miRNAs and diseases. However, those methods mainly predict the existence of miRNA-disease associations, and they cannot predict the deep-level miRNA-disease association types. RESULTS In this study, we propose a new end-to-end deep learning method (called PDMDA) to predict deep-level miRNA-disease associations with graph neural networks (GNNs) and miRNA sequence features. Based on the sequence and structural features of miRNAs, PDMDA extracts the miRNA feature representations by a fully connected network (FCN). The disease feature representations are extracted from the disease-gene network and gene-gene interaction network by GNN model. Finally, a multilayer with three fully connected layers and a softmax layer is designed to predict the final miRNA-disease association scores based on the concatenated feature representations of miRNAs and diseases. Note that PDMDA does not take the miRNA-disease association matrix as input to compute the Gaussian interaction profile similarity. We conduct three experiments based on six association type samples (including circulations, epigenetics, target, genetics, known association of which their types are unknown and unknown association samples). We conduct fivefold cross-validation validation to assess the prediction performance of PDMDA. The area under the receiver operating characteristic curve scores is used as metric. The experiment results show that PDMDA can accurately predict the deep-level miRNA-disease associations. AVAILABILITY AND IMPLEMENTATION Data and source codes are available at https://github.com/27167199/PDMDA.
Collapse
Affiliation(s)
- Cheng Yan
- School of Information Science and Engineering, Hunan University of Chinese Medicine, Changsha 410208, China.,School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Guihua Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Na Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Lishen Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon SK S7N5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
39
|
Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: Relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Comput Biol Med 2022; 143:105322. [PMID: 35217342 DOI: 10.1016/j.compbiomed.2022.105322] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 12/21/2022]
Abstract
Recently, a large number of studies have indicated that circRNAs with covalently closed loops play important roles in biological processes and have potential as diagnostic biomarkers. Therefore, research on the circRNA-disease relationship is helpful in disease diagnosis and treatment. However, traditional biological verification methods require considerable labor and time costs. In this paper, we propose a new computational method (RGCNCDA) to predict circRNA-disease associations based on relational graph convolutional networks (R-GCNs). The method first integrates the circRNA similarity network, miRNA similarity network, disease similarity network and association networks among them to construct a global heterogeneous network. Then, it employs the random walk with restart (RWR) and principal component analysis (PCA) models to learn low-dimensional and high-order information from the global heterogeneous network as the topological features. Finally, a prediction model based on an R-GCN encoder and a DistMult decoder is built to predict the potential disease-associated circRNA. The predicted results demonstrate that RGCNCDA performs significantly better than the other six state-of-the-art methods in a 5-fold cross validation. Furthermore, the case study illustrates that RGCNCDA can effectively discover potential circRNA-disease associations.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yanpeng Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University, Foshan, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
40
|
Liu W, Lin H, Huang L, Peng L, Tang T, Zhao Q, Yang L. Identification of miRNA-disease associations via deep forest ensemble learning based on autoencoder. Brief Bioinform 2022; 23:6553934. [PMID: 35325038 DOI: 10.1093/bib/bbac104] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 02/18/2022] [Accepted: 03/01/2022] [Indexed: 12/31/2022] Open
Abstract
Increasing evidences show that the occurrence of human complex diseases is closely related to microRNA (miRNA) variation and imbalance. For this reason, predicting disease-related miRNAs is essential for the diagnosis and treatment of complex human diseases. Although some current computational methods can effectively predict potential disease-related miRNAs, the accuracy of prediction should be further improved. In our study, a new computational method via deep forest ensemble learning based on autoencoder (DFELMDA) is proposed to predict miRNA-disease associations. Specifically, a new feature representation strategy is proposed to obtain different types of feature representations (from miRNA and disease) for each miRNA-disease association. Then, two types of low-dimensional feature representations are extracted by two deep autoencoders for predicting miRNA-disease associations. Finally, two prediction scores of the miRNA-disease associations are obtained by the deep random forest and combined to determine the final results. DFELMDA is compared with several classical methods on the The Human microRNA Disease Database (HMDD) dataset. Results reveal that the performance of this method is superior. The area under receiver operating characteristic curve (AUC) values obtained by DFELMDA through 5-fold and 10-fold cross-validation are 0.9552 and 0.9560, respectively. In addition, case studies on colon, breast and lung tumors of different disease types further demonstrate the excellent ability of DFELMDA to predict disease-associated miRNA-disease. Performance analysis shows that DFELMDA can be used as an effective computational tool for predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Wei Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Hui Lin
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Ting Tang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Li Yang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| |
Collapse
|
41
|
Yu L, Zheng Y, Ju B, Ao C, Gao L. Research progress of miRNA-disease association prediction and comparison of related algorithms. Brief Bioinform 2022; 23:6542222. [PMID: 35246678 DOI: 10.1093/bib/bbac066] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/30/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
With an in-depth understanding of noncoding ribonucleic acid (RNA), many studies have shown that microRNA (miRNA) plays an important role in human diseases. Because traditional biological experiments are time-consuming and laborious, new calculation methods have recently been developed to predict associations between miRNA and diseases. In this review, we collected various miRNA-disease association prediction models proposed in recent years and used two common data sets to evaluate the performance of the prediction models. First, we systematically summarized the commonly used databases and similarity data for predicting miRNA-disease associations, and then divided the various calculation models into four categories for summary and detailed introduction. In this study, two independent datasets (D5430 and D6088) were compiled to systematically evaluate 11 publicly available prediction tools for miRNA-disease associations. The experimental results indicate that the methods based on information dissemination and the method based on scoring function require shorter running time. The method based on matrix transformation often requires a longer running time, but the overall prediction result is better than the previous two methods. We hope that the summary of work related to miRNA and disease will provide comprehensive knowledge for predicting the relationship between miRNA and disease and contribute to advanced computation tools in the future.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yujia Zheng
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Bingyi Ju
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
42
|
Bi XA, Xing Z, Zhou W, Li L, Xu L. Pathogeny Detection for Mild Cognitive Impairment via Weighted Evolutionary Random Forest with Brain Imaging and Genetic Data. IEEE J Biomed Health Inform 2022; 26:3068-3079. [PMID: 35157601 DOI: 10.1109/jbhi.2022.3151084] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Medical imaging technology and gene sequencing technology have long been widely used to analyze the pathogenesis and make precise diagnoses of mild cognitive impairment (MCI). However, few studies involve the fusion of radiomics data with genomics data to make full use of the complementarity between different omics to detect pathogenic factors of MCI. This paper performs multimodal fusion analysis based on functional magnetic resonance imaging (fMRI) data and single nucleotide polymorphism (SNP) data of MCI patients. In specific, first, using correlation analysis methods on sequence information of regions of interests (ROIs) and digitalized gene sequences, the fusion features of samples are constructed. Then, introducing weighted evolution strategy into ensemble learning, a novel weighted evolutionary random forest (WERF) model is built to eliminate the inefficient features. Consequently, with the help of WERF, an overall multimodal data analysis framework is established to effectively identify MCI patients and extract pathogenic factors. Based on the data of MCI patients from the ADNI database and compared with some existing popular methods, the superiority in performance of the framework is verified. Our study has great potential to be an effective tool for pathogenic factors detection of MCI.
Collapse
|
43
|
Jiang H, Huang Y. An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network. BMC Bioinformatics 2022; 23:9. [PMID: 34983364 PMCID: PMC8726520 DOI: 10.1186/s12859-021-04553-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 12/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. RESULTS In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. CONCLUSIONS The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.
Collapse
Affiliation(s)
- Hanjing Jiang
- Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, Institute of Artificial Intelligence, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yabing Huang
- Department of Pathology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei, China.
| |
Collapse
|
44
|
Huang Z, Han Y, Liu L, Cui Q, Zhou Y. LE-MDCAP: A Computational Model to Prioritize Causal miRNA-Disease Associations. Int J Mol Sci 2021; 22:ijms222413607. [PMID: 34948403 PMCID: PMC8706837 DOI: 10.3390/ijms222413607] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/12/2021] [Accepted: 12/14/2021] [Indexed: 01/03/2023] Open
Abstract
MicroRNAs (miRNAs) are associated with various complex human diseases and some miRNAs can be directly involved in the mechanisms of disease. Identifying disease-causative miRNAs can provide novel insight in disease pathogenesis from a miRNA perspective and facilitate disease treatment. To date, various computational models have been developed to predict general miRNA-disease associations, but few models are available to further prioritize causal miRNA-disease associations from non-causal associations. Therefore, in this study, we constructed a Levenshtein-Distance-Enhanced miRNA-disease Causal Association Predictor (LE-MDCAP), to predict potential causal miRNA-disease associations. Specifically, Levenshtein distance matrixes covering the sequence, expression and functional miRNA similarities were introduced to enhance the previous Gaussian interaction profile kernel-based similarity matrix. LE-MDCAP integrated miRNA similarity matrices, disease semantic similarity matrix and known causal miRNA-disease associations to make predictions. For regular causal vs. non-disease association discrimination task, LF-MDCAP achieved area under the receiver operating characteristic curve (AUROC) of 0.911 and 0.906 in 10-fold cross-validation and independent test, respectively. More importantly, LE-MDCAP prominently outperformed the previous MDCAP model in distinguishing causal versus non-causal miRNA-disease associations (AUROC 0.820 vs. 0.695). Case studies performed on diabetic retinopathy and hsa-mir-361 also validated the accuracy of our model. In summary, LE-MDCAP could be useful for screening causal miRNA-disease associations from general miRNA-disease associations.
Collapse
|
45
|
Wang L, You ZH, Li JQ, Huang YA. IMS-CDA: Prediction of CircRNA-Disease Associations From the Integration of Multisource Similarity Information With Deep Stacked Autoencoder Model. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5522-5531. [PMID: 33027025 DOI: 10.1109/tcyb.2020.3022852] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Emerging evidence indicates that circular RNA (circRNA) has been an indispensable role in the pathogenesis of human complex diseases and many critical biological processes. Using circRNA as a molecular marker or therapeutic target opens up a new avenue for our treatment and detection of human complex diseases. The traditional biological experiments, however, are usually limited to small scale and are time consuming, so the development of an effective and feasible computational-based approach for predicting circRNA-disease associations is increasingly favored. In this study, we propose a new computational-based method, called IMS-CDA, to predict potential circRNA-disease associations based on multisource biological information. More specifically, IMS-CDA combines the information from the disease semantic similarity, the Jaccard and Gaussian interaction profile kernel similarity of disease and circRNA, and extracts the hidden features using the stacked autoencoder (SAE) algorithm of deep learning. After training in the rotation forest (RF) classifier, IMS-CDA achieves 88.08% area under the ROC curve with 88.36% accuracy at the sensitivity of 91.38% on the CIRCR2Disease dataset. Compared with the state-of-the-art support vector machine and K -nearest neighbor models and different descriptor models, IMS-CDA achieves the best overall performance. In the case studies, eight of the top 15 circRNA-disease associations with the highest prediction score were confirmed by recent literature. These results indicated that IMS-CDA has an outstanding ability to predict new circRNA-disease associations and can provide reliable candidates for biological experiments.
Collapse
|
46
|
Yi HC, You ZH, Guo ZH, Huang DS, Chan KCC. Learning Representation of Molecules in Association Network for Predicting Intermolecular Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2546-2554. [PMID: 32070992 DOI: 10.1109/tcbb.2020.2973091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.
Collapse
|
47
|
Xuan P, Wang D, Cui H, Zhang T, Nakaguchi T. Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA-disease association prediction. Brief Bioinform 2021; 23:6385813. [PMID: 34634106 DOI: 10.1093/bib/bbab428] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/01/2021] [Accepted: 09/19/2021] [Indexed: 12/14/2022] Open
Abstract
Identifying disease-related microRNAs (miRNAs) assists the understanding of disease pathogenesis. Existing research methods integrate multiple kinds of data related to miRNAs and diseases to infer candidate disease-related miRNAs. The attributes of miRNA nodes including their family and cluster belonging information, however, have not been deeply integrated. Besides, the learning of neighbor topology representation of a pair of miRNA and disease is a challenging issue. We present a disease-related miRNA prediction method by encoding and integrating multiple representations of miRNA and disease nodes learnt from the generative and adversarial perspective. We firstly construct a bilayer heterogeneous network of miRNA and disease nodes, and it contains multiple types of connections among these nodes, which reflect neighbor topology of miRNA-disease pairs, and the attributes of miRNA nodes, especially miRNA-related families and clusters. To learn enhanced pairwise neighbor topology, we propose a generative and adversarial model with a convolutional autoencoder-based generator to encode the low-dimensional topological representation of the miRNA-disease pair and multi-layer convolutional neural network-based discriminator to discriminate between the true and false neighbor topology embeddings. Besides, we design a novel feature category-level attention mechanism to learn the various importance of different features for final adaptive fusion and prediction. Comparison results with five miRNA-disease association methods demonstrated the superior performance of our model and technical contributions in terms of area under the receiver operating characteristic curve and area under the precision-recall curve. The results of recall rates confirmed that our model can find more actual miRNA-disease associations among top-ranked candidates. Case studies on three cancers further proved the ability to detect potential candidate miRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Dong Wang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| |
Collapse
|
48
|
Zheng K, You ZH, Wang L, Li YR, Zhou JR, Zeng HT. MISSIM: An Incremental Learning-Based Model With Applications to the Prediction of miRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1733-1742. [PMID: 32749964 DOI: 10.1109/tcbb.2020.3013837] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few years, the prediction models have shown remarkable performance in most biological correlation prediction tasks. These tasks traditionally use a fixed dataset, and the model, once trained, is deployed as is. These models often encounter training issues such as sensitivity to hyperparameter tuning and "catastrophic forgetting" when adding new data. However, with the development of biomedicine and the accumulation of biological data, new predictive models are required to face the challenge of adapting to change. To this end, we propose a computational approach based on Broad learning system (BLS) to predict potential disease-associated miRNAs that retain the ability to distinguish prior training associations when new data need to be adapted. In particular, we are introducing incremental learning to the field of biological association prediction for the first time and proposed a new method for quantifying sequence similarity. In the performance evaluation, the AUC in the 5-fold cross-validation was 0.9400 +/- 0.0041. To better assess the effectiveness of MISSIM, we compared it with various classifiers and former prediction models. Its performance is superior to the previous method. Besides, the case study on identifying miRNAs associated with breast neoplasms, lung neoplasms and esophageal neoplasms show that 34, 36 and 35 out of the top 40 associations predicted by MISSIM are confirmed by recent biomedical resources. These results provide ample convincing evidence of this approach have potential value and prospect in promoting biomedical research productivity.
Collapse
|
49
|
Qian Y, Zhang Y, Zhang J. Alignment-Free Sequence Comparison With Multiple k Values. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1841-1849. [PMID: 31765317 DOI: 10.1109/tcbb.2019.2955081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Alignment-free sequence comparison approaches have become increasingly popular in computational biology, because alignment-based approaches are inefficient to process large-scale datasets. Still, there is no way to determine the optimal value of the critical parameter k for alignment-free approaches in general. In this article, we tried to solve the problem by involving multiple k values simultaneously. The method counts the occurrence of each k-mer with different k values in a sequence. Two weighting schemes, based on maximizing deviation method and genetic algorithm, are then used on these counts. We applied the method to enhance the three common alignment-free approaches D2, D2S, and D2*, and evaluated its performance on similarity search and functionally related regulatory sequences recognition. The enhanced approaches achieve better performance than the original approaches in all cases, and much better performance than some other common measures, such as Pcc, Eu, Ma, Ch, Kld, and Cos.
Collapse
|
50
|
Ji BY, You ZH, Wang Y, Li ZW, Wong L. DANE-MDA: Predicting microRNA-disease associations via deep attributed network embedding. iScience 2021; 24:102455. [PMID: 34041455 PMCID: PMC8141887 DOI: 10.1016/j.isci.2021.102455] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 03/02/2021] [Accepted: 04/19/2021] [Indexed: 12/24/2022] Open
Abstract
Predicting the microRNA-disease associations by using computational methods is conductive to the efficiency of costly and laborious traditional bio-experiments. In this study, we propose a computational machine learning-based method (DANE-MDA) that preserves integrated structure and attribute features via deep attributed network embedding to predict potential miRNA-disease associations. Specifically, the integrated features are extracted by using deep stacked auto-encoder on the diverse orders of matrixes containing structure and attribute information and are then trained by using random forest classifier. Under 5-fold cross-validation experiments, DANE-MDA yielded average accuracy, sensitivity, and AUC at 85.59%, 84.23%, and 0.9264 in term of HMDD v3.0 dataset, and 83.21%, 80.39%, and 0.9113 in term of HMDD v2.0 dataset, respectively. Additionally, case studies on breast, colon, and lung neoplasms related disease show that 47, 47, and 46 of the top 50 miRNAs can be predicted and retrieved in the other database.
Collapse
Affiliation(s)
- Bo-Ya Ji
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Yi Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Leon Wong
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| |
Collapse
|